Direct-Path INSERT is a very efficient way to load data into data warehouse tables. The load performance can even be improved with NOLOGGING on the target table. In Oracle 12c Release 2, this is even possible for Oracle Data Guard environments. But is it a good idea?
One of the new features in Oracle 12.2 is the support of NOLOGGING in Oracle Data Guard. The title of this enhancement in the new features documentation is “Oracle Data Guard for Data Warehouses”. What is the benefit of this feature, and how useful is it for data warehouses? To explain this, let’s first look at two important database features for data warehousing that are available since years: Direct-Path INSERT and NOLOGGING.
A very efficient way to load data into a table is Direct-Path INSERT. Unlike a conventional INSERT, new rows are not applied to the database buffer cache row by row, but written directly to the data files in new data blocks at the end of the table. Because a Direct-Path INSERT bypasses the space management to reuse free space in the existing blocks, it is much faster to insert multiple rows into a target table. It is highly recommended to use Direct-Path INSERT for loading large data sets into a data warehouse database.
The drawback of this feature is that free space in existing table blocks is not used for the new data. For example, if existing rows were deleted (which usually does not occur in a data warehouse), the free parts in the table block are lost and cannot be reused for new rows. This is also the case for the last block of a Direct-Path INSERT. The free space after the last row will not be used anymore for the next Direct-Path INSERT, because then a new block will be written. For ETL jobs that insert only very few rows at a time, a conventional INSERT is a better choice. But for ETL jobs that load a large amount of rows, for example a daily load into a fact table, the advantage of better load performance is much more important than the (small) loss of empty space in the blocks.
Direct-Path INSERT is used in the following situations:
- When an append hint is added to an INSERT or MERGE statement
- When an append_values hint is added to a PL/SQL FORALL statement
- When an INSERT statement is executed with Parallel DML
- When a CREATE TABLE AS SELECT is executed
INSERT /*+ append */ INTO fact_sales
SELECT * from stage_sales;
100’000 rows inserted.
There are several restrictions with Direct-Path INSERT that must be considered. Further information can be found in the SQL Language Reference and in the Database Administrator’s Guide of the Oracle documentation.
The Impact of NOLOGGING
The load performance of Direct-Path INSERT can be improved when the target table is set to NOLOGGING, as shown in the following example:
ALTER TABLE fact_sales NOLOGGING;
Here a short summary what this setting means and when it takes place:
- When a target table is set to NOLOGGING, only minimal redo log information is written during a Direct-Load INSERT. This reduces the load time of large data loads because no overhead for redo logging is used anymore.
- For any other DML statements, i.e. conventional INSERT, UPDATE or DELETE statements, NOLOGGING has no impact at all. In all these cases, the normal redo logging mechanism is used, independent whether the target table is set no LOGGING or NOLOGGING.
- If the database is set to FORCE LOGGING (which must be the case in a Data Guard environment), the NOLOGGING parameter is ignored, i.e. normal redo log information is written even for Direct-Load INSERT.
The combination of Direct-Path INSERT and NOLOGGING supports a very efficient load performance. But the price for it is that you are not able to recover lost data from the archive redo log files. This sounds dramatically, but for many data warehouses, this is not a real issue. As long as the data can be reloaded from the source systems, from delivered flat files or from a preceding data warehouse layer (e.g. a persistent staging area), setting all target tables to NOLOGGING is a feasible and often used option.
Real Life Story:
In a data warehouse of my first customer at Trivadis, we created all tables in the DWH database with NOLOGGING. This improved the ETL performance dramatically, and in this case it was possible because all data of the nightly loads was delivered in flat files. After loading these files into the staging area of the data warehouse, the files were moved to an archive directory on the file system.
One day, the “worst case scenario” happened. One of the developers (fortunately it was not me) started a schema cleanup script on the wrong database and dropped all fact tables in the production database instead of the test environment. Oops… Because of the NOLOGGING setting of all target tables, a point-in-time recovery was not possible. So, the DBA had to restore the last full backup, and then we had to reload all data since the last backup from the archived flat files. The accident happened on a Thursday evening, the whole night and Friday was used to restore the database from the backup tapes of Tuesday. On Saturday morning we were able to start reloading the missing data (Tuesday to Saturday), and on Monday morning, the system was online again. So, we were lucky that the system was offline only one working day (Friday).
But even when it is possible to reload the data from files or any other sources, there is still another restriction with NOLOGGING; When Oracle Data Guard is in place, the redo log information is used to apply all data changes from the primary database to the standby database. If some of the tables on the primary database are set to NOLOGGING, this is not possible. Therefore, it is highly recommended to run the primary database in FORCE LOGGING mode. Or in other words: If a data warehouse is installed on an Oracle Data Guard environment, NOLOGGING cannot be used to improve load performance.
Oracle Data Guard and NOLOGGING in Oracle 12.2
In Oracle 12c Release 2, a new mechanism has been introduced in Oracle Data Guard. Blocks written by NOLOGGING operations on the primary database are recorded in the control file of the physical standby database and can be recovered with a new RMAN command: RECOVER DATABASE NONLOGGED BLOCK. Details about this scenario are described in the Oracle Data Guard Concepts and Administration manual.
Franck Pachot did some tests of this feature and explained the results in his blog post 12cR2: Recover nonlogged blocks after NOLOGGING in Data Guard. His recommendation: As long as a recovery of the non-logged blocks is done on the standby database directly after the NOLOGGING operation on the primary database, the features is “acceptable”.
When I read the sparse information in the Oracle 12.2 documentation and the comments of Franck’s blog post, I’m not sure whether I would recommend to use Direct-Path INSERT with NOLOGGING for a data warehouse in an Oracle Data Guard environment. Although it is possible to apply the non-logged blocks on a physical standby database, a lot of details have to be considered to avoid bad surprises in case of a media failure or a switchover. So, if you plan to use this feature in your data warehouse, it is important to evaluate the pros and cons of this configuration.
My recommendations are:
- Use Direct-Path INSERT in your ETL processes to improve load performance of large data sets. See tip 9 in my blog post 10 Tips to Improve ETL Performance.
- If data can be reloaded from the source systems, flat files or previous DWH layers, the tables can additionally be configured with NOLOGGING – if you are not on Oracle Data Guard.
- For Data Guard on Oracle 12.2, you can use NOLOGGING. But check carefully the impact on all levels (performance, high availability, backup/recovery) with all involved parties (DWH architects, ETL developers, database administrators).