Bitmap-based COUNT(DISTINCT) Functions in Oracle 19c

Oracle 19c provides a set of new SQL functions for bitmap operations. They allow fast computation of COUNT(DISTINCT) expressions. What is the purpose of these functions? The Oracle documentation is very sparse, but at least it tells us that the main focus are Materialized Views on aggregated data. Let’s have a detailed look at this new feature.

Have you ever tried to improve the query performance of COUNT(DISTINCT) queries with a Materialized View, using Query Rewrite? It works, as long as you aggregate the data on only one specific aggregation level. But when you have different dimensions or multiple hierarchy levels, a separate Materialized View is required for each aggregation. The goal is to create few Materialized Views, but use them for as many different queries as possible.

To explain this goal, I describe a very simple use case. I want to create one Materialized View that can be used for Query Rewrite of the following two queries. I dare say this was not possible before Oracle 19c with only one Materialized View (if someone has a solution, let me know).

 

SELECT promo_id, COUNT(DISTINCT prod_id)

  FROM sales

GROUP BY promo_id

ORDER BY promo_id;

 

 

SELECT channel_id, COUNT(DISTINCT prod_id)

  FROM sales

GROUP BY channel_id

ORDER BY channel_id;

 

 

With the new bitmap operations of Oracle 19c, one Materialized View can support multiple dimensions as well as different aggregation levels within one dimension. But first let’s get a look at the new bitmap functions.

New Functions for Bitmap Operations

Five new internal functions are provided for bitmap operations:

  • BITMAP_BIT_POSITION: Mapping of a number to the absolute bit position in a bitmap. The input parameter must be of type NUMBER.
  • BITMAP_BUCKET_NUMBER: Bucket number within the bitmap. Each bucket contains 16000 numbers. The input parameter must be of type NUMBER.
  • BITMAP_CONSTRUCT_AGG: Bitmap array of aggregated numbers. For each existing number, the bit at the corresponding position is set. The first input parameter is the bit position returned by BITMAP_BIT_POSITION. The second (not documented) parameter is the representation of the return value (default: BLOB; other possible value: RAW)
  • BITMAP_COUNT: Counts the bits set to “1” in a bitmap array. The input parameter is the result of the function BITMAP_CONSTRUCT_AGG. The return value is the number of bits set to “1”.
  • BITMAP_OR_AGG: Aggregation of bitmap arrays of multiple rows. The input parameter is the result of the function BITMAP_CONSTRUCT_AGG. The result is a bitmap array that combines all bitmaps with an OR predicate. This is useful for aggregations on higher hierarchy levels.

To show the behavior of the functions, here a simple example with numbers 1 to 15. Because the row set is too small, BITMAP_BUCKET_NUMBER is always 1. In each of the bitmaps, only one bit is set. Therefore, BITMAP_COUNT is always 1. 

 

SELECT rownum

     , BITMAP_BIT_POSITION(rownum) bitpos

     , BITMAP_BUCKET_NUMBER(rownum) bucket

     , BITMAP_CONSTRUCT_AGG(rownum, ‘RAW’) bitmap

     , BITMAP_COUNT(BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(rownum))) bitcnt

  FROM dual CONNECT BY rownum <= 15

GROUP BY rownum ORDER BY 1;

 

    ROWNUM     BITPOS     BUCKET BITMAP           BITCNT

———- ———- ———- ———— ———-

         1          0          1 0002                  1

         2          1          1 0004                  1

         3          2          1 0008                  1

         4          3          1 0010                  1

         5          4          1 0020                  1

         6          5          1 0040                  1

         7          6          1 0080                  1

         8          7          1 000001                1

         9          8          1 000002                1

        10          9          1 000004                1

        11         10          1 000008                1

        12         11          1 000010                1

        13         12          1 000020                1

        14         13          1 000040                1

        15         14          1 000080                1

 

15 rows selected.

 

 

If the result is aggregated, the total bitmap is combined with BITMAP_OR_AGG. A BITMAP_COUNT on this bitmap array returns 15. This corresponds the number of “1” bits in the bit array, or in other words, the number of distinct values in the aggregated result.

 

WITH input AS (

     SELECT rownum num

          , BITMAP_BIT_POSITION(rownum) bitpos

          , BITMAP_BUCKET_NUMBER(rownum) bucket

          , BITMAP_CONSTRUCT_AGG(rownum, ‘RAW’) bitmap

          , BITMAP_COUNT(BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(rownum))) bitcnt

       FROM dual CONNECT BY rownum <= 15

     GROUP BY rownum

     )

SELECT BITMAP_OR_AGG(bitmap) total_bitmap

     , BITMAP_COUNT(BITMAP_OR_AGG(bitmap)) total_bitcnt

  FROM input;

 

TOTAL_BITMAP TOTAL_BITCNT

———— ————

00FEFF                 15

 

 

To be honest, this is a very complicated way to get the number of distinct values, and it is definitely not the right approach for some adhoc queries. But that is not the idea of these new functions. The purpose of this bit array operations is to use the same results for aggregations on different levels. And this brings us back to our original example: How can I write one Materialized Views that can be used for COUNT(DISTINCT) queries across different dimensions or hierarchy levels?

A Flexible Materialized View for COUNT(DISTINCT)

The following Materialized View uses the new bitmap functions to calculate a bitmap array for the distinct values on PROD_ID for the two different aggregations on PROMO_ID and CHANNEL_ID. For this purpose, the two functions BITMAP_BIT_POSITION and BITMAP_CONSTRUCT_AGG are used. Additionally, the function BITMAP_BUCKET_NUMBER is used as an addtional aggregation column. The Materialized View MV_SALES is enabled for Query Rewrite.

 

CREATE MATERIALIZED VIEW mv_sales

ENABLE QUERY REWRITE

AS

SELECT promo_id

     , channel_id

     , BITMAP_BUCKET_NUMBER(prod_id)

     , BITMAP_CONSTRUCT_AGG(BITMAP_BIT_POSITION(prod_id))

  FROM sales

GROUP BY promo_id, channel_id, BITMAP_BUCKET_NUMBER(prod_id);

 

Materialized view MV_SALES created.

 

 

Both queries of our initial example can use the same Materialized View MV_SALES to get the number of distinct products per PROMO_ID or per CHANNEL_ID. Without the new bitmap functions of Oracle 19c, two separate Materialized Views were required.

 

EXPLAIN PLAN FOR

SELECT promo_id, COUNT(DISTINCT prod_id)

  FROM sales

GROUP BY promo_id

ORDER BY promo_id;

 

Explained.

 

SELECT * FROM TABLE(dbms_xplan.display);

 

————————————————————————————

| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |

————————————————————————————

|   0 | SELECT STATEMENT        |          |     3 |    51 |     4  (25)| 00:00:01 |

|   1 |  SORT GROUP BY          |          |     3 |    51 |     4  (25)| 00:00:01 |

|   2 |   VIEW                  |          |     3 |    51 |     4  (25)| 00:00:01 |

|   3 |    HASH GROUP BY        |          |     3 |   144 |     4  (25)| 00:00:01 |

|   4 |     MAT_VIEW ACCESS FULL| MV_SALES |     9 |   432 |     3   (0)| 00:00:01 |

————————————————————————————

 

EXPLAIN PLAN FOR

SELECT channel_id, COUNT(DISTINCT prod_id)

  FROM sales

GROUP BY channel_id

ORDER BY channel_id;

 

Explained.

 

SELECT * FROM TABLE(dbms_xplan.display);

 

————————————————————————————

| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |

————————————————————————————

|   0 | SELECT STATEMENT        |          |     3 |    48 |     4  (25)| 00:00:01 |

|   1 |  SORT GROUP BY          |          |     3 |    48 |     4  (25)| 00:00:01 |

|   2 |   VIEW                  |          |     3 |    48 |     4  (25)| 00:00:01 |

|   3 |    HASH GROUP BY        |          |     3 |   141 |     4  (25)| 00:00:01 |

|   4 |     MAT_VIEW ACCESS FULL| MV_SALES |     9 |   423 |     3   (0)| 00:00:01 |

————————————————————————————

 

 

Additional Information

The Oracle Database 19c documentation currently contains only few information about the new bitmap-based functions. In the Data Warehousing Guide, there are two small sections:

The five new functions are described in the SQL Language Reference:

2 thoughts on “Bitmap-based COUNT(DISTINCT) Functions in Oracle 19c

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s