Tuesday, 27 November 2007

Aggregation : What Happens with Fan-Trap ?

I have received recently a claim from a customer regarding the aggregation behaviour of his workbook. In particular he was afraid to see an item on the Group By section clause of the Discoverer Query. At first look I immediately recognized that the architecture selected for his workbook was resulting in a fan-trap schema and so could generate ambiguous data, but due to his insistence I reviewed inside his workbook the reason of this. So, let's first define what is a fan-trap and its rules.

What are fan traps, and how does Discoverer handle them?

A fan trap is a group of joined database tables that might return unexpected results. The most common manifestation of a fan trap occurs when a master table is joined to two or more detail tables independently. Figure shows a simple fan trap schema





Although this construction is relationally correct, you are likely to return incorrect results if you use a straightforward SQL statement to aggregate data points. However, if you use Discoverer to aggregate the data points, Discoverer will never return incorrect results, provided fan trap detection is enabled. Every query that Discoverer generates is interrogated for potential fan traps. If a fan trap is detected, Discoverer can usually rewrite the query using inline views to ensure the aggregation is done at the correct level. Discoverer creates an inline view for each master-detail aggregation, and then combines the results of the outer query.


In some circumstances, Discoverer will detect a query that involves an unresolvable fan trap schema, as follows:

  • if the detail folders use different keys from the master for the join
  • if there is a direct join relationship between the detail folders (thereby creating an ambiguous circular relationship)
  • if non-aggregated values are chosen from more than one of the detail folders if more than one detail folder has a separate join relationship to a different master folder
  • if more than one detail folder has a separate join relationship to a different master folder

How the aggregation works in fan-traps ?

Discoverer interrogates the query, detects a fan trap, and rewrites the query to ensure the aggregation is done at the correct level. Discoverer rewrites the query using inline views, one for each master-detail aggregation, and then combines the results of the outer query.

Now, back to the customer issue, I verified the following schema in the Workbook (in green the folders which items are used in the Workbook) :


Each one of this folder, joins the other folders (or Master Folders) on a different keys, generating a fan-trap schema. So, if he creates a crosstab workbook with the following layout :







The aggregation will be made on
  • Department.Item (Because it is Page Item and rollup at page item level)

  • Customers.Item (Because it is an axis item and rollup at row level)

  • Product Details.Item (Because it is an axis item and rollup at row level)


but will add to the Group-By expression also the Sales SUM.Item, because by looking at customer schema, this item is from Sales SUM folder, which is detail of Stores folder which is the master of Products folder, which is iself master of Product Details, for which items are in the workbook. So, basically Sales SUM Item is interpreted in this case as Master of involved items, and so added to the Group By clause, as the aggregation is made for each master-detail relationship.

The only solution here is to create a custom folder just for this workbook, otherwise one should re-architecture the joins so that workbook will not be built on a fan-trap schema.

No comments: