Using Temporary Tables for Complex Reports

Global temporary tables (or private temporary tables, if you use Oracle 18c) are a practical feature to be used for SQL queries with multiple steps, for example complex reports in a data warehouse. If you want to use them, some details must be considered.

For complex SQL statements in ETL jobs or reports, it can be useful to split them into multiple, less complex parts. This can be done in several ways:

  • With a subquery factoring clause (“WITH clause”). I like this approach because the whole query can be executed in one SQL statement or even implemented in a view. The query optimizer is able to execute these statements either as an inline view or by using a temporary table. See tip 7 of the blog post 10 Tips to Improve ETL Performance.
  • With an intermediate table that is created by a CREATE TABLE AS SELECT (CTAS) statement. When you drop the table at the end of the job, don’t forget the PURGE option – otherwise all the intermediate tables will stay in the recycle bin.
  • With global temporary tables (or even private temporary tables in Oracle 18c). The result of one step is stored in a temporary table, which can be used as input for the next step. About this approach, I want to write in this blog post.

A Customer Use Case with Temporary Tables

Last week, I had to do some performance tuning for a data warehouse of one of my customers. For a complex report, several global temporary tables are used. I don’t describe the whole case and all the SQL statements here – that would exceed the length of a blog post. But the principle is simple. The report contains the following steps:

  1. In a first step, the parameters provided by the user (e.g. business unit, date range, customer class, etc.) are written to a global temporary table (let’s call it TMP_PARAM)
  2. A complex SQL query is executed on the data mart tables, using the parameters from the temporary table, The result of this query is stored in another global temporary table TMP_1. To improve performance, the INSERT statements run with a /*+ PARALLEL */ hint.
  3. The result of the first query is used as input for the next step. In this step, the temporary table TMP_1 is joined with some other data warehouse tables to load a global temporary table TMP_2. This is also done with a /*+ PARALLEL */ hint.
  4. Finally, the report joins the two temporary tables several times (let’s say, for different hierarchy levels) and concatenates the results with UNION ALL.
  5. All these steps are executed in one transaction. At the end of the report, a ROLLBACK is performed, so the temporary tables will be empty at the end.

Because each session consists its own copies of the global temporary tables, multiple users can run the same report concurrently. This is a common and elegant way to run complex reports in a data warehouse environment. But there is one little issue here. Do you see it?

Temp table report

A Simplified Use Case

To show what happens, I create a simplified use case for this blog post. Of course, the query is much too simple for such a solution, it’s only for educational purposes.

 

——————————————————————————–——————————————————————————–

— Create three temporary tables:

——————————————————————————–——————————————————————————–

 

CREATE GLOBAL TEMPORARY TABLE tmp_param

(country_code VARCHAR2(2));

 

CREATE GLOBAL TEMPORARY TABLE tmp_1

AS SELECT * FROM customers WHERE 1 = 2;

 

CREATE GLOBAL TEMPORARY TABLE tmp_2

AS SELECT * FROM sales WHERE 1 = 2;

 

——————————————————————————–——————————————————————————–

— Load temporary tables

——————————————————————————–——————————————————————————–

 

— Set parameter to ‘Germany’:

INSERT INTO tmp_param VALUES (‘DE’);

 

— Load all customers from Germany to TMP_1:

INSERT /*+ parallel */ INTO tmp_1

SELECT cu.*

  FROM customers cu

  JOIN countries co ON (co.country_id = cu.country_id)

  JOIN tmp_param pa ON (pa.country_code = co.country_iso_code);

 

— Load all sales transactions for customers in Germany to TMP_2:

INSERT /*+ parallel */ INTO tmp_2

SELECT sa.*

  FROM sales sa

  JOIN tmp_1 cu ON (cu.cust_id = sa.cust_id);

 

— Select number of sales transactions:

SELECT COUNT(*) from tmp_2;

 

— Rollback at the end of the report

ROLLBACK;

 

All steps of this simple use case work as expected,, so what is the problem?

Let’s Look at the Details

When looking at the execution plan to load table TMP_1, we can see that a conventional INSERT in serial mode is performed. Although the query is executed in parallel, the final step to insert the data into the temporary table runs serially:

 

——————————————————————————–——————————————————————————–

| Id  | Operation                        | Name      |    TQ  |IN-OUT| PQ Distrib |          

——————————————————————————–——————————————————————————–

|   0 | INSERT STATEMENT                 |           |        |      |            |          

|   1 |  LOAD TABLE CONVENTIONAL         | TMP_1     |        |      |            |          

|   2 |   PX COORDINATOR                 |           |        |      |            |          

|   3 |    PX SEND QC (RANDOM)           | :TQ10002  |  Q1,02 | P->S | QC (RAND)  |

|*  4 |     HASH JOIN                    |           |  Q1,02 | PCWP |            |          

|* 19 |        TABLE ACCESS INMEMORY FULL| CUSTOMERS |  Q1,02 | PCWP |            |          

——————————————————————————–——————————————————————————–

                                                                                                                                     

The same happens for table TMP_2. This has nothing to do with the global temporary table, but is the standard behavior in Oracle when Parallel DML (PDML) is disabled (which is the case by default). A note at the end of the execution plan tells us the reason for the serial INSERT:

   – PDML is disabled in current session        
                                                                                    

In most cases, this will not be a real issue. When the result set of the query is small enough, the insert into the temporary table is fast enough without PDML. But let’s see what happens if we want to execute the whole statement in parallel. For this, we enable PDML and run the statements again.

 

SQL> ALTER SESSION ENABLE PARALLEL DML;

 

Session altered.

 

SQL> — Set parameter to ‘Germany’:

SQL> INSERT INTO tmp_param VALUES (‘DE’);

 

1 row inserted.

 

SQL> — Load all customers from Germany to TMP_1:

SQL> INSERT /*+ parallel */ INTO tmp_1

  2  SELECT cu.*

  3    FROM customers cu

  4    JOIN countries co ON (co.country_id = cu.country_id)

  5    JOIN tmp_param pa ON (pa.country_code = co.country_iso_code);

 

8’173 rows inserted.

 

The INSERT statement to load table TMP_1 runs parallel now. Additionally, we have another advantage in Oracle 12c: The table statistics are gathered automatically – even for temporary tables.

                                                                                                                                     

——————————————————————————–——————————————————————————–

| Id  | Operation                             | Name      |    TQ  |IN-OUT| PQ Distrib |     

——————————————————————————–——————————————————————————–

|   0 | INSERT STATEMENT                      |           |        |      |            |     

|   1 |  PX COORDINATOR                       |           |        |      |            |     

|   2 |   PX SEND QC (RANDOM)                 | :TQ10002  |  Q1,02 | P->S | QC (RAND)  |     

|   3 |    LOAD AS SELECT (TEMP SEGMENT MERGE)| TMP_1     |  Q1,02 | PCWP |            |     

|   4 |     OPTIMIZER STATISTICS GATHERING    |           |  Q1,02 | PCWP |            |     

|*  5 |      HASH JOIN                        |           |  Q1,02 | PCWP |            |     


|  19 |        PX BLOCK ITERATOR              |           |  Q1,02 | PCWC |            |     

|* 20 |         TABLE ACCESS INMEMORY FULL    | CUSTOMERS |  Q1,02 | PCWP |            |     

——————————————————————————–——————————————————————————–

                                                                                                                                     

   – Global temporary table session private statistics used                                                                          

 

Loading table TMP_2 causes a new problem now. After a PDML statement, a COMMIT is required before table is further used. In our example, the second query reads table TMP_1, that was just loaded with a parallel INSERT statement. This causes an ORA-12838 error:


SQL> — Load all sales transactions for customers in Germany to TMP_2:

SQL> INSERT /*+ parallel */ INTO tmp_2

  2  SELECT sa.*

  3    FROM sales sa

  4    JOIN tmp_1 cu ON (cu.cust_id = sa.cust_id);

 

 

ORA-12838: cannot read/modify an object after modifying it in parallel

 

Global Temporary Tables and Parallel DML

To avoid an ORA-12838 error, a COMMIT must be performed after each PDML statement. Because the data in a global temporary tables is only available within one transaction per default, the tables must be created with as session-specific global temporary tables with the additional clause ON COMMIT RESERVE ROWS. The final version of our simplified use case must be slightly extended in the following way:

 

——————————————————————————–——————————————————————————–

— Create three temporary tables:

——————————————————————————–——————————————————————————–

 

CREATE GLOBAL TEMPORARY TABLE tmp_param

(country_code VARCHAR2(2))

ON COMMIT PRESERVE ROWS;

 

 

CREATE GLOBAL TEMPORARY TABLE tmp_1

ON COMMIT PRESERVE ROWS

AS SELECT * FROM customers WHERE 1 = 2;

 

CREATE GLOBAL TEMPORARY TABLE tmp_2

ON COMMIT PRESERVE ROWS

AS SELECT * FROM sales WHERE 1 = 2;

 

——————————————————————————–——————————————————————————–

— Load temporary tables

——————————————————————————–——————————————————————————–

 

ALTER SESSION ENABLE PARALLEL DML;

 

INSERT INTO tmp_param VALUES (‘DE’);

 

INSERT /*+ parallel */ INTO tmp_1

SELECT cu.*

  FROM customers cu

  JOIN countries co ON (co.country_id = cu.country_id)

  JOIN tmp_param pa ON (pa.country_code = co.country_iso_code);

 

COMMIT;

 

INSERT /*+ parallel */ INTO tmp_2

SELECT sa.*

  FROM sales sa

  JOIN tmp_1 cu ON (cu.cust_id = sa.cust_id);

 

COMMIT;

 

— Select number of sales transactions:

SELECT COUNT(*) from tmp_2;

 

With this version, the to INSERT statements to load TMP_1 and TMP_2 run in PDML mode, and the data in the temporary table is available after the COMMIT statements for the following queries. The only detail you have to mention: If the same script runs more than once within the session, the temporary tables must be truncated or deleted before the next execution.

How About Private Temporary Tables?

Oracle 18c introduced a new type of temporary tables: Private Temporary Tables. They can also be used for the approach described here. About the differences between Global Temporary Tables and Private Temporary Tables, I will write in another blog post.

3 thoughts on “Using Temporary Tables for Complex Reports

  1. Hello!

    Thank you for the post. I have few questions:
    – What was the ROLLBACK supposed to do in the original version of the workflow? I see it as extra work with no benefit. (When you were asking about the “little issue here” I thought you were referring to the ROLLBACK in point 5)
    – If I am wrong and the ROLLBACK was useful in the original version, then why did it disappear in the final version? Also the newly introduced COMMIT would interfere with the ROLLBACK at the end.
    – As this was a performance tuning exercise for You please share us the results of the tuning itself. I am really curious how much faster the TMP table creation went with the parallel INSERT

    Thanks a lot
    Rob

    Like

    • Hi Rob

      Thank you for your comment. It is always a pleasure to see how detailed my blog posts are read. Here the quick answers to your questions:
      – The ROLLBACK is needed to end the transaction and to delete the contents of the temporary tables. A COMMIT would have the same impact, when the default settings of global temprary tables are used (ON COMMIT DELETE ROWS)
      – In the final version, this is not needed anymore because the last transaction (after COMMIT of TMP_2) only selects data. But instead, the temporary tables must be deleted before the next rerun, as a wrote in a short notice at the end of the post.
      – The results are hard to compare because I did some changes in the queries as well. With adapting the query for TMP_2 (in the real project), the elapsed time went down from several minutes to 9 seconds. After enabling PDML, it runs 7 seconds. So, the performance benefit is not very high, but depends on the amount of the result set. What is much more important, is that the queries selecting the data are well-written and fast.

      Cheers, Dani.

      Like

      • Hi Dani!

        Thanks for the reply. I really do not want to put the focus on something else than your intention was with this post, but the idea of “rollback on a temp table” thing kept me bothering and I wanted to see the cost of rollback vs commit on a temp table. I also wanted to understand if the amount of work for rollback is different for a temp table than for a normal table.

        So I tested. I did 3 test
        TEST1: normal table: commit vs. rollback
        TEST2: temporary table: commit vs. rollback
        TEST3: rollback: temp table vs normal table

        DB version for the tests was 12.2.

        Can you render this readable (monotype, linesize 200)?

        ===============================================================================================
        TEST1: normal table: commit vs. rollback
        ===============================================================================================

        drop table test_rollback_normal purge;
        create table test_rollback_normal
        as select * from dba_objects where 1=2;
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        @runstats start
        commit;
        @runstats pause
        drop table test_rollback_normal purge;
        create table test_rollback_normal
        as select * from dba_objects where 1=2;
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        @runstats resume
        rollback;
        @runstats stop t=1000

        ===============================================================================================
        RunStats report : 18-MAY-2018 11:09:41
        ===============================================================================================

        ———————————————————————————————–
        1. Summary timings
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        TIMER elapsed time (hsecs) 14 29 15
        TIMER cpu time (hsecs) 8 24 16

        Comments:
        1) Run1 was 51.7% quicker than Run2
        2) Run1 used 51.7% less CPU time than Run2

        ———————————————————————————————–
        2. Statistics report
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        LATCH simulator hash latch 99 1,780 1,681
        STAT redo synch time (usec) 33,093 37,652 4,559
        STAT rollback changes – undo records applied 0 13,295 13,295
        STAT redo entries 1 20,426 20,425
        LATCH corrupted undo seg latch 0 39,886 39,886
        STAT db block changes 1,131 41,984 40,853
        STAT db block gets from cache (fastpath) 2,725 44,696 41,971
        STAT db block gets 3,310 45,282 41,972
        STAT db block gets from cache 3,310 45,282 41,972
        STAT session logical reads 3,831 45,937 42,106
        LATCH cache buffers chains 18,909 176,270 157,361
        STAT redo size 132 4,057,308 4,057,176
        STAT logical read bytes from cache 31,383,552 376,315,904 344,932,352

        ———————————————————————————————–
        3. Latching report
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        LATCH total latches used 19,852 219,409 199,557

        Comments:
        1) Run1 used 91% fewer latches than Run2

        ———————————————————————————————–
        4. Time model report
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        TIME sql execute elapsed time 65,965 66,920 955
        TIME DB CPU 72,989 309,280 236,291
        TIME DB time 107,769 346,724 238,955

        ———————————————————————————————–
        5. About
        ———————————————————————————————–
        – RunStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
        – Based on the original RUNSTATS utility by Tom Kyte

        ===============================================================================================
        End of report (TEST1)
        ===============================================================================================

        ===============================================================================================
        TEST2: temporary table: commit vs. rollback
        ===============================================================================================

        drop table test_rollback_tmp purge;
        create global temporary table test_rollback_tmp on commit delete rows
        as select * from dba_objects where 1=2;
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        @runstats start
        commit;
        @runstats pause
        drop table test_rollback_tmp purge;
        create global temporary table test_rollback_tmp on commit delete rows
        as select * from dba_objects where 1=2;
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        @runstats resume
        rollback;
        @runstats stop t=1000

        ===============================================================================================
        RunStats report : 18-MAY-2018 11:05:33
        ===============================================================================================

        ———————————————————————————————–
        1. Summary timings
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        TIMER elapsed time (hsecs) 11 25 14
        TIMER cpu time (hsecs) 8 17 9

        Comments:
        1) Run1 was 56% quicker than Run2
        2) Run1 used 56% less CPU time than Run2

        ———————————————————————————————–
        2. Statistics report
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        LATCH simulator hash latch 105 1,797 1,692
        STAT redo entries 1 13,296 13,295
        STAT rollback changes – undo records applied 0 13,295 13,295
        STAT consistent changes 1,128 28,688 27,560
        LATCH corrupted undo seg latch 0 39,886 39,886
        STAT db block changes 1,129 41,984 40,855
        STAT db block gets from cache (fastpath) 2,721 44,696 41,975
        STAT db block gets 3,305 45,282 41,977
        STAT db block gets from cache 3,305 45,282 41,977
        STAT session logical reads 3,825 45,937 42,112
        STAT redo synch time (usec) 13,211 67,260 54,049
        LATCH cache buffers chains 20,341 183,389 163,048
        STAT redo size 132 1,329,612 1,329,480
        STAT logical read bytes from cache 31,334,400 376,315,904 344,981,504

        ———————————————————————————————–
        3. Latching report
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        LATCH total latches used 35,702 240,730 205,028

        Comments:
        1) Run1 used 85.2% fewer latches than Run2

        ———————————————————————————————–
        4. Time model report
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        TIME DB CPU 83,272 235,590 152,318
        TIME DB time 96,947 304,676 207,729

        ———————————————————————————————–
        5. About
        ———————————————————————————————–
        – RunStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
        – Based on the original RUNSTATS utility by Tom Kyte

        ===============================================================================================
        End of report (TEST2)
        ===============================================================================================

        ===============================================================================================
        TEST3: rollback: temp table vs. normal table
        ===============================================================================================

        drop table test_rollback_tmp purge;
        create global temporary table test_rollback_tmp on commit delete rows
        as select * from dba_objects where 1=2;
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        insert into test_rollback_tmp (select * from dba_objects);
        @runstats start
        rollback;
        @runstats pause
        drop table test_rollback_normal purge;
        create table test_rollback_normal
        as select * from dba_objects where 1=2;
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        insert into test_rollback_normal (select * from dba_objects);
        @runstats resume
        rollback;
        @runstats stop t=1000

        ===============================================================================================
        RunStats report : 18-MAY-2018 11:45:36
        ===============================================================================================

        ———————————————————————————————–
        1. Summary timings
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        TIMER elapsed time (hsecs) 39 36 -3
        TIMER cpu time (hsecs) 25 25 0

        Comments:
        1) Run2 was 7.7% quicker than Run1
        2) Run1 used the same amount of CPU time as Run2

        ———————————————————————————————–
        2. Statistics report
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        STAT redo entries 13,296 20,426 7,130
        STAT IMU undo allocation size 72 13,280 13,208
        STAT IMU Redo allocation size 0 101,940 101,940
        STAT redo size 1,329,656 4,057,264 2,727,608
        STAT logical read bytes from cache 368,795,648 376,315,904 7,520,256
        STAT temp space allocated (bytes) -58,720,256 0 58,720,256

        ———————————————————————————————–
        3. Latching report
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        LATCH total latches used 240,235 219,299 -20,936

        Comments:
        1) Run2 used 8.7% fewer latches than Run1

        ———————————————————————————————–
        4. Time model report
        ———————————————————————————————–

        Type Name Run1 Run2 Diff
        —– ————————————————– ———— ———— ————
        TIME DB time 385,685 416,051 30,366
        TIME DB CPU 266,681 314,773 48,092

        ———————————————————————————————–
        5. About
        ———————————————————————————————–
        – RunStats v2.01 by Adrian Billington (http://www.oracle-developer.net)
        – Based on the original RUNSTATS utility by Tom Kyte

        ===============================================================================================
        End of report (TEST3)
        ===============================================================================================

        Looking at the results my conclusion is that rollback on a temp table is almost as bad as rollback on a normal table, and commit is always better than rollback. So one should clean up a temp table with a commit.
        The difference may be marginal overall in the whole processing, but it may as well be noticeable.

        Cheers,
        Rob

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s