Star Schema Design in Oracle: Partitioning

Partitioning is one of the most powerful features for data warehouses in Oracle databases. In this blog post, I explain how it can be used for the physical design of star schemas. What is the recommended partitioning strategy for a star schema, and what are the advantages of partitioning?

In my blog post series about star schema design in Oracle databases, I started with the fundamental rules about constraints and indexes. For details, read the previous blog post. In this post, I will write about an important feature in many data warehouses: Partitioning.

Preliminary Remarks

  1. Partitioning is not available in all database offerings of Oracle. It requires Enterprise Edition and the Oracle Partitioning option. If you don’t have a license for this option, you can either skip this blog post or – my personal recommendation – think about purchasing the Oracle Partitioning option. If you already work with the Partitioning option, you are lucky. Partitioning is one of the key features for data warehousing with Oracle databases.
  2. The star schema I will use for the examples in this post is the same one I introduced in the last post. We have one fact table FACT_BEER_DELIVERY referring four different dimension tables DIM_DATE, DIM_CUSTOMER, DIM_BEER and DIM_BOTTLING.

Star schema example
Figure 1: Star schema for a craft beer brewery

Partitioning Strategy for Star Schemas

The common partitioning approach in a star schema is to partition the fact table by date. This is one of the reasons why the dimension key of the calendar dimension is usually a DATE column. The foreign key to the calendar dimension (in our example, column DELIVERY_DATE) is the partition key in the fact table. As partitioning method, we use RANGE partitioning. To reduce the maintenance overhead to create new partitions, the easiest way is to use INTERVAL partitioning, an extension of RANGE partitioning. The interval size depends on the granularity and size of the fact table. Often, monthly partitions are used, but other intervals (e.g. year or quarter) are also possible. For very large fact tables, it can even be useful to create daily partitions. Here it is highly recommended to work with INTERVAL partitioning, otherwise a high number of partitions must be created periodically.

In our example, we want to use monthly partitions for the fact table FACT_BEER_DELIVERY. The oldest available facts are from 2016, therefore we use INTERVAL partitioning starting from January 2016. In the current versions of Oracle, it is possible to change a non-partitioned table into a partitioned table with one command:


ALTER TABLE fact_beer_delivery
MODIFY PARTITION BY RANGE(delivery_date)
INTERVAL(NUMTOYMINTERVAL(1, ‘MONTH’))
(PARTITION old_data VALUES LESS THAN (DATE’2016-01-01′));

 

 

Special Cases

This partitioning strategy sounds straight forward and easy to implement. But there are situations where we have to think carefully about the partition key. Let’s assume we have more than one date column in the fact table, e.g. the delivery date and the billing date. Which one is the right partition key? It depends on the type of questions that the data mart users ask – or in other words, what queries are the most common. If most of the queries are filtered or aggregated by delivery date ranges, delivery date is the recommended partition key. If the queries are mainly about billing dates, this date should be used for partitioning. If both type of queries are typical, then eventually two separate star schemas (with conformed dimensions) may be a solution, or you can use Composite RANGE-RANGE partitioning for both dates.

Another advanced partitioning strategy is when you have large dimension tables. Let’s assume our craft beer brewery has millions of customers (which is of course not realistic). In this case, a possible solution could be to partition the largest dimension table DIM_CUSTOMER with HASH partitioning on the dimension key. For the fact table, we would then use Composite RANGE-HASH partitioning. This enables the usage of a full partition-wise join between the two tables.

For further information and practical examples for such special cases, have a look at my presentation Partitioning Your Oracle Data Warehouse – Just a Simple Task? The presentation is quite old (I presented this session at Oracle Open World 2009), but the partition strategies can still be used.

Indexes on Partitioned Tables

It is highly recommended to use local partitioned indexes in a data warehouse environment. They are easier to maintain when old partitions are dropped (see section “Information Lifecycle Management” below). For the standard indexing strategy in a star schema, the choice between global and local index is simple: Bitmap indexes must always be local indexes.

A local index is partitioned in the same way as the table. For each partition (or subpartition) of the table, a corresponding index partition (or subpartition) is created. If we add a new partition – either manually or automatically with INTERVAL partitioning – new partitions are created for each of the local indexes. There is even an additional goodie: If we convert a non-partitioned table into a partitioned table with an ALTER TABLE … MODIFY PARTITION command as described above, the existing indexes are automatically converted into local indexes.

For our example star schema in figure 1, we created four bitmap indexes on the fact table – one for each dimension key. When we convert the fact table into a partitioned table with monthly partitions, we will have local bitmap indexes with monthly partitions, too. When we already created a partitioned fact table, we just have to add the keyword LOCAL when creating the indexes:

 
CREATE BITMAP INDEX FACT_BD_BEER_ID_BITMAP
ON FACT_BEER_DELIVERY (BEER_ID) LOCAL;

CREATE BITMAP INDEX FACT_BD_BOTTLING_ID_BITMAP
ON FACT_BEER_DELIVERY (BOTTLING_ID) LOCAL;

CREATE BITMAP INDEX FACT_BD_CUSTOMER_ID_BITMAP
ON FACT_BEER_DELIVERY (CUSTOMER_ID) LOCAL;

CREATE BITMAP INDEX FACT_BD_DELIVERY_DATE_BITMAP
ON FACT_BEER_DELIVERY (DELIVERY_DATE) LOCAL;
 

 

Advantages of Partitioning

Partitioning the fact table by date has several advantages for data warehouse environments: partition pruning, information lifecycle management and partition exchange load. Let’s have a look at these concepts in the following sections.

Partition Pruning

Many reports and queries on a star schema filter the facts by a date range. As soon as we add a filter on the partition key (either directly on the fact table, or – more typical – with a WHERE condition on the calendar dimension), the query performance can be improved by the usage of Partition Pruning:  Instead of reading all data in the fact table, only the required partitions must be scanned. The smaller the date range, the less partitions must be read.

Let’s explain this with a simple query: We want to know how many beer bottles were delivered in the first quarter of 2020. As we can see in columns “Pstart” and Pstop” of the execution plan, only the partitions from position 50 to 52 are scanned. These are the monthly partitions for January, February and March 2020:

 
EXPLAIN PLAN FOR
SELECT SUM(f.number_of_bottles)
  FROM fact_beer_delivery f
 WHERE f.delivery_date BETWEEN DATE'2020-01-01' AND DATE'2020-03-31';
 
SELECT * FROM TABLE(dbms_xplan.display(format => 'basic +partition'));
 
--------------------------------------------------------------------------
| Id  | Operation                   | Name               | Pstart| Pstop |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |       |       |
|   1 |  SORT AGGREGATE             |                    |       |       |
|   2 |   PARTITION RANGE ITERATOR  |                    |    50 |    52 |
|   3 |    TABLE ACCESS STORAGE FULL| FACT_BEER_DELIVERY |    50 |    52 |
--------------------------------------------------------------------------
 

This is not the common way to query a star schema. Typically, we join the fact table with the calendar dimension. In the following example, we filter for the first quarter of 2020, but want to display the number of bottles per month. Partition pruning will take place again and only read the three monthly partitions that are needed. But the partition numbers are not displayed in “Pstart” and “Pstop” columns of the execution plan, because they are not known at parse time:

 
EXPLAIN PLAN FOR
SELECT d.calendar_month_desc
     , SUM(f.number_of_bottles)
  FROM fact_beer_delivery f
  JOIN dim_date d ON (d.date_id = f.delivery_date)
 WHERE d.calendar_quarter_desc = '2020-1'
GROUP BY d.calendar_month_desc;
 
SELECT * FROM TABLE(dbms_xplan.display(format => 'basic +partition'));
 
---------------------------------------------------------------------------------------
| Id  | Operation                                | Name               | Pstart| Pstop |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                    |       |       |
|   1 |  HASH GROUP BY                           |                    |       |       |
|   2 |   HASH JOIN                              |                    |       |       |
|   3 |    JOIN FILTER CREATE                    | :BF0001            |       |       |
|   4 |     PART JOIN FILTER CREATE              | :BF0000            |       |       |
|   5 |      TABLE ACCESS STORAGE FULL FIRST ROWS| DIM_DATE           |       |       |
|   6 |    JOIN FILTER USE                       | :BF0001            |       |       |
|   7 |     PARTITION RANGE JOIN-FILTER          |                    |:BF0000|:BF0000|
|   8 |      TABLE ACCESS STORAGE FULL FIRST ROWS| FACT_BEER_DELIVERY |:BF0000|:BF0000|
---------------------------------------------------------------------------------------
 

In this case, Join-Filter Pruning is used by the optimizer. A Bloom filter is created based on the filter on the calendar dimension DIM_DATE. This Bloom filter is then used for dynamic partition pruning on the fact table FACT_BEER_DELIVERY.

Partition Pruning is usually the main reason for partitioning the fact tables. Especially if you have several years of historic data in the data mart, performance can be improved for queries that filter only a small date range.

Information Lifecycle Management

In data warehouses, we often use rolling time windows. For example, when the users of a data mart are only interested in facts of the last few years, we can delete all data from the fact table that is not required anymore. Oracle offers several techniques of Information Lifecycle Management (ILM), based on time-based partitioning of the data.

For our example star schema, we want to keep seven years of history. Now, in January 2023, we want to delete all facts from January 2016. This can be done with one command that drops the corresponding partition:

 
ALTER TABLE fact_beer_delivery
DROP PARTITION FOR (TO_DATE('2016-01-01', 'YYYY-MM-DD'));
 

Of course, we usually don’t do that manually, an automatic cleanup job is more common. How we can get rid of old data by dropping the outdated partitions is described my blog post Housekeeping in Oracle: How to Get Rid of Old Data.

Information Lifecycle Management is much more than just deleting old data. There are other use cases that can be useful for historic data in a fact table. For example, we can compress partitions, set them to read-only, move them to other tablespaces, etc. A practical example of monthly partition maintenance is shown in the presentation Partitioning Your Oracle Data Warehouse – Just a Simple Task? In combination with Oracle Database In-Memory, it is possible to populate the newest (i.e. often used) partitions into the In-Memory column store. More about this in fa future blog post of this series.

Partition Exchange Load

Even for loading data, time-based partitioning has some benefits. If we load the data mart incrementally, we can choose the load frequency as interval size for the fact table. For example, for daily loads we create daily partitions. In this case, the facts can be loaded with Partition Exchange. The steps to do this, I already explained in previous blog posts. Here just a short summary of the different steps:

Partition exchange
Figure 2: Steps for Partition Exchange Load

  1. Create a stage table with the same structure as the partitioned target table. In the current Oracle releases, this is very easy, as explained in blog post Partition Exchange in Oracle 12.2.
  2. Load the data for the next time period (e.g. one day or one month) into the stage table.
  3. Create the same indexes as the indexes on the partitioned table. This is possible if the target table contains local indexes only.
  4. Create a new (empty) partition on the partitioned target table. For INTERVAL partition, this is a bit tricky. A possible solution is described in blog post Partition Exchange and Interval Partitioning.
  5. Do a Partition Exchange between the new partition and the stage table. This step just swaps the references in the data dictionary, The stage table with the indexes is now the new partition.
But what are the benefits of Partition Exchange? Why is it useful to load data into a new partition via an intermediate stage table? There are several advantages of this approach. If we can load an empty stage table without indexes and create the indexes afterwards, this reduces the load times. Additionally, the optimizer statistics on the stage table are gathered automatically since Oracle 12c. But the main advantage is the restartability of the load process. If the load job must be repeated, for example because of data quality issues, the same steps can be repeated without any cleanup or delete process at the beginning. And during the load, the previous version of the data is still available for the users.
 

Summary

Partitioning is a powerful feature for data warehouses. If you partition the fact table in a star schema by date, you can benefit from several advantages:
  • Query performance: Partition Pruning helps to read only the required partitions for time-based queries and reduces the response time of the queries.
  • Maintenance of historical data: Partitioning helps to get rid of old data, but also supports other maintenance tasks of Information Lifecycle Management.
  • Load performance: With Partition Exchange Load, it is possible to load a complete partition. This improves load performance and reduces the downtime of the data marts.
 

1 thought on “Star Schema Design in Oracle: Partitioning

Leave a comment