Star Schema Design in Oracle: In-Memory

In the third part of this blog post series about star schema design, let’s have a look at a powerful, but for many people still unknown feature: Oracle Database In-Memory enables massive performance improvements for typical data warehouse queries.

Oracle Database In-Memory (aka “In-Memory option”) was introduced with Oracle 12c. Although this is ten years ago, the feature is still not very widely used in many Oracle data warehouses. This is sad, because it is a powerful performance features, especially for star schemas.

It would go too far to describe all the technical details and configuration options of Oracle Database In-Memory in this blog post. This is not necessary anyway, because there is a very good Oracle White Paper about it, which has already been updated and extended several times. In this post I will only explain the basic concept of In-Memory and how it can be successfully used for Star Schemas.

In-Memory Column Store

Caching data in memory to reduce disk I/O is nothing new. Oracle (and all other database systems) use this concepts since decades. In the Oracle database, data blocks are stored in the buffer cache, a part of the System Global Area (SGA) of the database instance. So, blocks that are read or written several times, are kept in memory. An old, but still useful concept.

So why do we need an additional In-Memory option when caching is already in place? With the In-Memory feature, a separate memory area is assigned in the SGA: The In-Memory Column Store. In this memory structure, the data is not stored column-wise (or correctly: block-wise), but in columnar format. Each column of a table is stored separately in memory in a compressed form. This allows very fast read access for queries that aggregate the columns – what we typically do in data warehouses.

Oracle Database In-Memory Architecure
Figure 1: In-Memory Column Store in Oracle Architecture

One big advantage of the In-Memory Column Store is that data can be stored with a much higher compression ratio than in normal data blocks. This allows to populate lots of data into memory.

Oracle Database In-Memory requires Oracle Database Enterprise Edition (like most other data warehouse features) plus an additional license for the In-Memory option. But there are some good news: If your data marts are not very big, it is even possible to use the In-Memory option without additional costs. Since Oracle 20c/21c, the In-Memory Base Level feature allows to use Oracle Database In-Memory for up to 16 GB without an additional license.

Enabling In-Memory Column Store

Enabling the features of the In-Memory option is very easy: With the configuration parameter INMEMORY_SIZE we can define the size of the In-Memory Column Store. If the parameter is set to a value greater than 0, Oracle Database In-Memory is enabled. The default is 0 (disabled), the minimum value to enable it is 100M. For my test cases, I configured 300 megabytes:

ALTER SYSTEM SET INMEMORY_SIZE = 300M SCOPE=BOTH;

A bit more complicated it is when you don’t have an additional license for the In-Memory option. You can still use the feature, as long as you use at least version 21c of the database and do not exceed the limit of 16 gigabytes. For this, the parameter INMEMORY_FORCE must be set to BASE_LEVEL. This is not possible with an ALTER SYSTEM command, you must modify the initialization file on CDB level and restart the database. With In-Memory Base Level, the parameter INMEMORY_SIZE can be set to a value up to 16 GB. For details refer the Oracle Database Reference documentation.

Impact on Physical Design of Star Schemas

When we want to use the In-Memory features for our data marts, how can we configure it and what is the impact on the star schema design? Let’s look at this using the craft beer brewery example from the previous blog posts of this series:

Star Schema Example
Figure 2: Star schema for a craft beer brewery (example from previous blog post)

The straight-forward approach that works for most star schemas is populating the In-Memory Column Store with the fact table and all dimension tables. For our example, we execute the following five commands:

ALTER TABLE dim_dates INMEMORY;
ALTER TABLE dim_customer INMEMORY;
ALTER TABLE dim_beer INMEMORY;
ALTER TABLE dim_bottling INMEMORY;
ALTER TABLE fact_beer_delivery INMEMORY;

By default, the column store is not populated immediately, but when the tables are accessed the first time. This can be configured with an optional PRIORITY clause. Also the compression level can be defined for each table with the MEMCOMPRESS clause, but the default FOR QUERY LOW is appropriate for most cases. Further details about the different configuration settings are explained in detail in the mentioned white paper Oracle Database In-Memory with Oracle Database 19c.

What other impact does In-Memory have on the design of the star schema and on the SQL queries of the reports? The good news: Almost none. For the logical design, the ETL jobs as well as any reports or SQL queries on the dimension and fact tables, nothing changes. The fact and dimension tables are still the same, and it is still recommended to define primary key constraints on the dimension tables and foreign key constraints on the fact table.

What changes is the physical design: The index recommendations described in the post Star Schema Design in Oracle: Fundamentals are not relevant anymore when you use Oracle Database In-Memory. Any additional indexes except the primary key indexes on the dimension tables can be omitted. It is much faster to read the facts and dimension data from the column store. The concept of Star Transformation is replaced with a similar approach (Vector Transformation, see below), so there is no need for bitmap indexes on the fact table anymore. Or in other words: With the In-Memory option, you don’t need indexes anymore for performance improvements.

But what can we do if the amount of available memory is too small to populate all dimension and fact tables of our data warehouse into the In-Memory Column Store? In this case, we have to decide what data should be populated, dependent on the frequency of use. Because the biggest tables in star schemas are usually the fact tables, a recommended strategy is:

  • Populate all dimension tables into the In-Memory Column Store. Because there are (usually) much smaller than the fact tables, they do not need a lot of memory.
  • Fact tables should be partitioned by date, as described in blog post Star Schema Design in Oracle: Partitioning. Then you have the possibility to populate only the most often used partitions (typically the newest ones) into the column store. The historical partitions are still accessible, but not as fast as the current ones.
  • To improve query performance on the historic partitions, it is even possible to create indexes on the fact table for only these partitions. The concept of Local Partial Indexes is suitable for this approach.

In-Memory strategy for star schema
Figure 3: Star schema with partial In-Memory fact table

In-Memory Performance Features

After all this details about configuration and design strategies with Oracle Database In-Memory, let’s get to the key question: Why does the In-Memory option improves query performance?

Three performance features are mainly responsible for the fast response time, when we use the In-Memory option for our star schemas:

  • In-Memory Scans: Because data is stored in a columnar format in the In-Memory Column Store, only the required columns of a query must be read from each table. If only a subset of the columns are used, e.g. two dimension keys and one measurement in a fact table, only the corresponding In-Memory Compression Units (IMCU) must be scanned.
  • In-Memory Joins: An extended version of a hash join, using Bloom Filters, improves the performance of joins between tables in the In-Memory Column Store. In a star schema, this improves the join performance between dimension and fact tables.
  • In-Memory Aggregations: Like the Star Transformation (see post Star Schema Design in Oracle: Fundamentals), Oracle introduces a similar approach to join multiple dimensions with one fact table. This so-called Vector Transformation creates key vectors for each of the dimensions and combines them to access the rows in the fact table.

Vector Transformation
Figure 4: Principle of vector transformation (with two dimensions)

Summary

If you have high requirements on the query performance of your star schemas, the usage of Oracle Database In-Memory is recommendable. For small data volumes this is possible with Oracle Database Enterprise Edition. For larger data warehouses the In-Memory Option must be licensed. If your data marts are too large to load them completely into the In-Memory Column Store, there are various strategies that combine In-Memory with other performance features.

I had the chance to use Oracle Database In-Memory in several projects so far, and in all cases, the performance improvements were impressive.

Leave a comment