Virtual Columns in Oracle are a convenient feature to implement derived measures of a fact table. But in combination with Oracle Database In-Memory, it has an impact on performance – unless you use Oracle 12c Release 2.
Derived measures in a fact table are measures that can be calculated from other measures of the table. A simple example is the calculation of the gross amount as total of net amount plus taxes:
gross amount = net amount + tax amount
The fact table below contains three dimension keys (customer, product and sales date) as well as the three measure columns (net amount, tax amount and gross amount). In this case, the derived measure gross amount was calculated as part of the ETL process and stored in a separate column of the fact table.
CUST_ID PROD_ID TIME_ID NET_AMOUNT TAX_AMOUNT GROSS_AMOUNT
——————— ——————— —————————— —————————— —————————— ------------------------
4711 123 10/04/2017 23.40 1.90 25.30
4711 456 10/04/2017 57.30 4.60 61.90
9876 777 10/04/2017 128.00 10.25 138.25
Especially when there are many derived measures in a fact table, you may think of calculating these measures at query time instead of storing all of them as separate columns in the fact table. There are different approaches to implement such a solution:
- The calculations can be defined in the BI tool that is used to run queries and reports on the star schema. This is a typical and recommended approach if the tool consists of a semantic metadata layer (e.g. the Business Model and Mapping Layer in OBIEE). The calculation can be implemented once and used in all reports and queries based on the metadata layer. Depending on the tool, the derived measures may have different names, e.g. calculated measures or logical columns.
- If the derived measures must be available for different query tools or if no metadata layer is available for a particular tool, it is possible to calculate the derived measures at query time in the database. A straight-forward approach is a view layer on top of the Data Mart tables. Calculations can be implemented as additional columns in views. This solution is quite flexible and allows calculations based on attributes of multiple tables. For example, if you want to calculate an amount in a different currency, you can join the fact table with an exchange rate table in the view.
- When the formula of a derived measure is based on columns of one table only, an easy and elegant way is to define it as a Virtual Column. A Virtual Column, available since Oracle 11g, is an expression that is calculated at the time when the column is selected from the table. The Virtual Column is part of the table definition, but is not physically stored in the table segment.
I discussed these different options with one of my customers, and for the particular project we decided to use Virtual Columns for a fact table with many derived measures. There were several reasons for this decision. First, the BI tool does not support a metadata layer, so that the formulas had to be implemented for each individual report. Second, it is useful for the developers if all measures are also available in SQL Developer, i.e. that they can be queried directly from the database. Third, the calculations are always derived from base measures of the same fact table. A perfect scenario for Virtual Columns!
For the simplified example above, the column GROSS_AMOUNT can be added to the existing fact table as a Virtual Column with the following DDL statement:
ALTER TABLE sales ADD
(gross_amount NUMBER(10,2) GENERATED ALWAYS
AS (net_amount + tax_amount) VIRTUAL)
With Virtual Columns, is was easy to implement around a dozen of derived measures without any impact on the size of the (already huge) fact table. The calculation of the derived measures, usually an arithmetic expression of two or three columns, was not an issue, even the performance overhead was minimal. The time to read the data from disk was much longer than the evaluation of the arithmetic expressions. But there is one little issue…
Unfortunately, my customer uses Oracle Database In-Memory (“unfortunately” is the wrong wording, because it is very convenient to work with this option). Because data is stored in a columnar compressed format in the In-Memory Column Store, it is very fast to read individual columns. But Virtual Columns are not populated to the Column Store in Oracle 12.1. When the whole table is set to INMEMORY, the Virtual Columns are ignored. If you try to populate the columns explicitly, this causes an error message:
SQL> ALTER TABLE sales INMEMORY (gross_amount);
Error starting at line : 1 in command -
ALTER TABLE sales INMEMORY (gross_amount)
Error report -
ORA-64359: INMEMORY clause may not be specified for Virtual Columns
*Cause: An attempt was made to enable the in-memory selective columns
feature for a Virtual Column.
Because of this restriction, the expression of a Virtual Column must be evaluated after reading all affected attributes. For our example, the columns NET_AMOUNT and TAX_AMOUNT must be read from the IM Column Store and “joined” before the expression can be calculated. The performance impact is that reading a Virtual Column is 2-3 times slower than reading a regular column from the Column Store. Here, the evaluation of the expression is not negligible anymore. By the way: The same problem occurs if the expression is part of a view or if the derived measures are calculated in the BI tool.
Fortunately, this problem is solved with Oracle 12c Release 2 (here, “fortunately” is the correct wording). With Oracle 12.2 it is possible now to populate Virtual Columns in the In-Memory Column Store, too:
SQL> ALTER TABLE sales INMEMORY (gross_amount);
Table SALES altered.
This improves the query performance on Virtual Columns, the response time will be the same as for regular columns. For our use case, this is exactly what we need! I hope my customer will upgrade to Oracle 12.2 soon 🙂