How to Replace a Bitmap Index

Bitmap indexes are typically used for columns with low cardinality, i.e. with a small number of distinct values. This is not always the best solution. The following use case describes a situation from a real-life project and explains why and how we replaced a bitmap index with a combination of virtual column and b-tree index.

Two comments for clarification at the beginning:

  • Bitmap indexes are designed for Data Warehouses, and they are really helpful in such environments. I’m a big fan of bitmap indexes and use them for almost all indexes in DWH databases, even for columns with high cardinality. But the situation described below happened in an OLTP system and caused massive locking problems. It is not recommended to use bitmap indexes in multi-user OLTP systems because of their locking behavior.
  • Many people think that a bitmap index is always a good choice for columns with low cardinality. This is not the case. If the values are equally distributed, a full table scan is usually faster than a bitmap index access. Bitmap indexes are helpful when multiple filter conditions can be combined, or when the data of the indexed column is skewed – like in the following example.

“Active” Flag with Skewed Data Distribution

The use case is a typical example that can be found in many database applications. On a table ORDERS, we have a column ACTIVE with two distinct values ‘Y’ and ‘N’, but skewed data distribution: Only 1% of the orders are active (ACTIVE = ‘Y’), and the other 99% percent are inactive (ACTIVE = ‘N’). To simulate this use case, we create the following table:

CREATE TABLE orders AS
SELECT ROWNUM id
, DECODE (MOD (ROWNUM, 100), 0, 'Y', 'N') active
, dbms_random.string('x', 1000) pad
FROM dual
CONNECT BY ROWNUM <= 100000;
SELECT active, COUNT(*)
FROM orders
GROUP BY active;
ACTIVE COUNT(*)
------ --------
Y 1000
N 99000

When selecting all active orders, an index should be used because the query has a very strong selectivity (i.e. only a small percentage of rows is returned in the result set). If we select all inactive orders, the selectivity is weak. Therefore a full table scan is a much better access method. To make sure that the optimizer can generate two different execution plans, histograms must be available. So, we gather the table statistics including histograms:

BEGIN
dbms_stats.gather_table_stats (
ownname => USER,
tabname => 'ORDERS',
method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY');
END;
/

Now we have to create an index on the column ACTIVE. But what kind of index? It depends on the type of application.

The Data Warehouse Approach

In a Data Warehouse, the solution is simple and elegant. We create a bitmap index on column ORDERS. Because of the skewed data distribution and the histogram on column ACTIVE, the optimizer is able to generate the perfect execution plans for queries on active as well as on inactive orders. For the selective query, the bitmap index is used. For the non-selective query, a full table scan is performed.

CREATE BITMAP INDEX orders_bm ON orders (active);
SELECT * FROM orders WHERE active = 'Y';
--------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 |
| 1 | TABLE ACCESS BY INDEX ROWID | ORDERS | 1000 |
| 2 | BITMAP CONVERSION TO ROWIDS| | |
|* 3 | BITMAP INDEX SINGLE VALUE | ORDERS_BM | |
--------------------------------------------------------
SELECT * FROM orders WHERE active = 'N';
-------------------------------------------
| Id | Operation | Name | Rows |
-------------------------------------------
| 0 | SELECT STATEMENT | | 99000 |
|* 1 | TABLE ACCESS FULL| ORDERS | 99000 |
-------------------------------------------

A nice benefit of bitmap indexes is that they need less disk space than corresponding b-tree indexes. When we look a the statistics of our bitmap index, we see that the index has 7 leaf blocks. Let’s notice this information for the next steps of our use case.

SELECT blevel, leaf_blocks
FROM user_ind_statistics
WHERE index_name = 'ORDERS_BM';
    BLEVEL LEAF_BLOCKS
---------- -----------
1 7

The OLTP Approach

In a multi-user OLTP system, it is a very bad idea to use bitmap indexes – especially on a status column that is frequently updated. An update of one row from active = ‘Y’ to active = ‘N’ would lock many rows of the table. Therefore bitmap indexes have to be avoided on an OLTP database.

But what is the alternative for our use case? The easiest solution is to replace the bitmap index by an ordinary b-tree index. Let’s do this:

DROP INDEX orders_bm;
CREATE INDEX orders_idx ON orders (active);

As we can see in the execution plans for the following queries, the optimizer is still able to find the best execution plans for both queries. To find the 1% of active orders, an index range scan is used. To read the 99% of inactive orders, a full table scan is the best choice.

SELECT * FROM orders WHERE active = 'Y';
--------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 |
| 1 | TABLE ACCESS BY INDEX ROWID| ORDERS | 1000 |
|* 2 | INDEX RANGE SCAN | ORDERS_IDX | 1000 |
--------------------------------------------------------
SELECT * FROM orders WHERE active = 'N';
-------------------------------------------
| Id | Operation | Name | Rows |
-------------------------------------------
| 0 | SELECT STATEMENT | | 99000 |
|* 1 | TABLE ACCESS FULL| ORDERS | 99000 |
-------------------------------------------

Everything looks perfect, so what’s the issue of this solution? When we look at the index statistics, we see that the index is much bigger than the bitmap index before. Instead of only 7 leaf blocks, the b-tree index contains 182 leaf blocks.

SELECT blevel, leaf_blocks
FROM user_ind_statistics
WHERE index_name = 'ORDERS_IDX';
    BLEVEL LEAF_BLOCKS
---------- -----------
1 182

For a small table like here (100’000 rows), this is usually not a real problem. But for large tables, we want to avoid such kind of indexes. It is useless to store all the ROWIDs for the value ‘N’ because they will never be used in a query.

An Alternative OLTP Approach

What we actually need is an index on column ACTIVE, but only for the value ‘Y’. The value ‘N’ should not be stored in the index. This is possible with a little trick: We create a virtual column on the table ORDERS that contains either the values ‘Y’ if an order is active – or NULL if the order is inactive. This can be implemented easily with a NULLIF expression. On this virtual column, we create a b-tree index. (For Oracle versions older than 11g, virtual columns are not available yet. In this case, a function-based index can be created instead).

DROP INDEX orders_idx;
ALTER TABLE orders
ADD active_virtual GENERATED ALWAYS AS (NULLIF(active, 'N'));
CREATE INDEX orders_idx ON orders (active_virtual);

Because NULL values are not stored in a b-tree index with only one column, the index contains only the ROWIDs for value ‘Y’. Therefore, the index is much smaller – even smaller than the bitmap index we used before.

SELECT blevel, leaf_blocks
FROM user_ind_statistics
WHERE index_name = 'ORDERS_IDX'; 
BLEVEL     LEAF_BLOCKS
---------- -----------
1 2

To use the virtual column and the index, we have to slightly adapt our queries. Instead of the column ACTIVE, we use the virtual column ACTIVE_VIRTUAL in our test queries. And instead of check for ‘N’, we have to check for NULL values. As we can see, the optimizer is still able to find the perfect execution plans. For the active orders, the index on the virtual column is used. For the inactive orders, a full table scan is performed – not only because of the weak selectivity, but also because the index does not contain the NULL values.

SELECT * FROM orders WHERE active_virtual = 'Y';
--------------------------------------------------------
| Id | Operation | Name | Rows |
--------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 |
| 1 | TABLE ACCESS BY INDEX ROWID| ORDERS | 1000 |
|* 2 | INDEX RANGE SCAN | ORDERS_IDX | 1000 |
--------------------------------------------------------
SELECT * FROM orders WHERE active_virtual IS NULL;
------------------------------------------
| Id | Operation | Name | Rows |
------------------------------------------
| 0 | SELECT STATEMENT | | 99000 |
|* 1 | TABLE ACCESS FULL| ORDERS | 99000 |
------------------------------------------

Conclusion

With a b-tree index and a histogram on a virtual column it is possible to create a specific index for the selective values of a column with low cardinality. The non-selective value (in this case ‘N’) is replaced by NULL and therefore not stored in the b-tree index. The index can be used for all selective queries, but does not contain useless data.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s