Physical design considerations

After you have verified the dimensional model, design the physical database. Develop strategies to handle aggregation, aggregate navigation, indexing, and partitioning of the data in your dimensional model.
When you design the physical database, you collect the following metadata:
When you create the physical design of a dimensional model, perform the following steps:
  1. Design the aggregations for each of the fact tables.
  2. Create indexes to improve performance.
  3. Partition the tables in the model.

Design aggregations

In simple terms, aggregation is the process of calculating summary data from detail base level fact table records. Aggregates are a powerful tool to increase the query processing speed in dimensional data marts. The aggregation is primarily performed by using attributes of a dimension which are a part of a hierarchy.

Each attribute that belongs to a hierarchy associates as a parent or child with other attributes of the hierarchy. This parent-child relationship provides different levels of summarization. The various levels of summarization provide the business user the ability to drill up or drill down in the report. Highly aggregated data is faster to retrieve than detailed, atomic-level data. The fact table typically occupies a large volume of space when compared to the aggregated data.

The lowest level of aggregation (or the highest level of detail) is referred as the grain of the fact table. The granularity of the dimension affects the design of data storage and how data is retrieved.

Aggregating detailed atomic fact tables improves query performance. However, there are costs associated with aggregation:
  • Storage
  • Cost to build and maintain the processes to handle the aggregated tables

Aggregation is not a substitute for reducing the size of large, detailed fact tables. If the data in the fact table is summarized, the detailed information that is in the form of dimensions and measures is often lost. If the business needs detailed data from a summarized fact table, that data is not available. The user would need to look for the details in the source OLTP system that provided the aggregated fact table data. If the business must use the source OLTP systems to get the answers, then reevaluate the need for a dimensional model.

Avoid mixing aggregated data and detailed data by including year-to-date aggregated measures with the detailed measures. Year-to-date measures are additive, and if you mix the data types, miscalculations can occur.

To prepare aggregate tables, perform the following steps:
  1. Identify all dimensions and their hierarchies from the base level atomic dimensional model. These dimensions and hierarchies are identified from the base-level atomic dimensional model.
  2. Identify all possible combinations of these hierarchy attributes which are used together by business for reporting. Identify all attributes from the hierarchies to determine which of the attributes are used together frequently. This step is especially important if there are a huge number of dimensions with several hierarchies that have several attributes in them.
  3. Calculate the number of values that each attribute has. The number of values each attribute has is indicative of whether the attribute is aggregated. For example, if you include a low-level member that has many members, then you may drop that attribute and choose a higher level attribute, which would have fewer values.
  4. Validate the final set of attribute candidates and build the aggregated dimensional model.

Create indexes

You can improve query performance by creating indexes. To learn more about indexes, see Indexes.

Partition tables

When you partition a table, you divide the table by row, by column, or both. If a table is divided by column, it is said to be vertically partitioned. If you divide by row, it is said to be horizontally partitioned. Partitioning large fact tables improves performance because each partition is more manageable. Partition a table based on the transaction date dimension in a dimensional model. For example, if a huge fact table has billions of rows, it would be ideal for one month of data to be assigned its own partition.

You achieve several goals by partitioning the data in a data warehouse, including the following goals:
  • Providing flexible access to data
  • Providing easy, efficient data management services
  • Ensuring data warehouse scalability
  • Enabling elements of the data warehouse to be portable, which allows those elements to be shared across data warehouses or archived easily
  • Improving query response times
  • Making maintenance easier, since partitions are easier to maintain than large tables

Feedback