Oracle Initialization Parameters for Data Warehouses

The configuration of a DWH database is different than the setting for an OLTP database. This has an impact on several initialization parameters of the Oracle database. The following list gives an overview of some important parameters that should be set correctly in a data warehouse environment.

Oracle 11g contains more than 340 initialization parameters to configure the database. In Oracle 12c, about 40 new parameters were added. Additionally, there are more than 3000 undocumented parameters available to influence internal functionalities and to enable or disable individual database features. A complete list of recommendations of a “typical” DWH database is almost impossible, but there are a couple of parameters that are important to be set properly. Some of the most important parameters are described in this blog post.

Optimization for Non-selective Queries

Most queries in a data warehouse are less selective than in an OLTP system. They read and aggregate many rows instead of reading a small subset of the data. Therefore, full table scans are much more important in a DWH environment than in an operational system. This is especially the case for all ETL jobs that load the data into the data warehouse. To increase the performance of full table scans, the following parameters should be set correctly.

  • The parameter db_file_multiblock_read_count should be increased to read more blocks during a multi-block read. The recommended number depends on the block size and should be set to a value that allows to read 1MB with one multi-block read. For example, if db_block_size is 8K, the value of db_file_multiblock_read_count can be set to 128. If db_block_size is 16K, a value of 64 is recommended.
  • For the parameter db_block_size, a typical recommendation is to use a block size of 16K or 32K in data warehouses. Actually, the impact on the performance of full table scans is not very high. In most cases, the default value of 8K is enough, but for large databases is can be helpful to increase the block size.
  • The default value for the parameter optimizer_mode is ALL_ROWS, and it must not be changed in a DWH environment. For the ETL processes and for most of the queries on a data mart, the throughput of the rows is more important than the response time for the first rows. The values FIRST_ROWS_n is adequate for OLTP systems, but not for data warehouses.
  • The parameters optimizer_index_cost_adj and optimizer_index_caching are obsolete if you gather system statistics. Although the default values are not perfect in all situations, they are good for typical DWH queries and should not be changed.

Enough Memory for Large Data Sets

For aggregations, hash joins and sort operations it is important to have enough memory available to avoid additional I/O on the temporary tablespace. This can be configured with several initialization parameters.

  • The strategy to assign work areas in the Private Global Area (PGA) can be defined with the parameter workarea_size_policy. On database level, it is reasonable to use the default value AUTO. In this case, the work areas for aggregations, hash joins and sort operations are allocated dynamically within the memory limit defined with the parameter pga_aggregate_target. In Oracle 12c, the upper limit of the PGA size can be defined by the parameter pga_aggregate_limit.
  • With Automatic Memory Management (AMM), the maintenance of the System Global Area (SGA) and Private Global Area (PGA) can be simplified. If the parameter memory_target is set to a value more than 0, the memory allocation of SGA and PGA is done dynamically. With the parameter memory_max_target, the maximal size of SGA+PGA can be specified.
  • For large ETL jobs it might be useful to allocate enough PGA memory for individual operations to reduce I/O on the temporary tablespace. In this case, the parameter workarea_size_policy can be set to MANUAL on session level. With the parameters hash_area_size (for hash joins) and sort_area_size (for sort operations), the amount of memory for the used work areas can be defined. This should be done on session level only. It is not recommended to use this policy for the whole database.

Specific Parameters for Data Warehouses

Some additional settings are required to use specific DWH features of the Oracle database. 

  • If you use materialized views in your data warehouse and want them to be used for query rewrite, the parameter query_rewrite_enabled must be set to TRUE (the default since Oracle 10g). Additionally, it is recommended to set the parameter query_rewrite_integrity to TRUSTED. This allows to use query rewrite even when foreign key constraints are not validated or when dimensions (a kind of hierarchy constraints on dimension tables) are defined.
  • The parameter star_transformation_enabled should be set to TRUE. This allows the Oracle optimizer to generate star transformations, i.e. efficient execution plans for star schemas with filters on multiple dimension tables. Because there were some issues with temporary tables in combination with star transformation in previous Oracle releases, it was recommended to use the value TEMP_DISABLE instead. This enabled the star transformation, but without the usage of temporary tables.

Parallel Execution

For queries on a huge amount of data as well as for long-running ETL jobs, the performance can be increased with parallel execution of the SQL statements (parallel query and parallel DML). Several initialization parameters are available to configure parallelization in the Oracle Database Enterprise Edition. The following recommendations are based on the explanations in the book Troubleshooting Oracle Performance (2nd edition) of Christian Antognini (p. 605 to 640).

  • The number of parallel slave processes can be configured with the parameters parallel_min_servers (number of processes started at instance startup) and parallel_max_servers (maximal number of processes that can be run in parallel). The default value 0 for parallel_min_servers is usually sufficient and has not to be changed. For parallel_max_servers, a recommended starting point is 10 to 20 times the number of CPU cores.
  • The degree of parallelism (DOP)  can be defined manually (with a DOP definition on tables and indexes or with the parallel hint). It is also possible to derive the optimal DOP automatically by Oracle. This can be configured with the parameter parallel_degree_policy. The default value MANUAL should not be changed in versions before In Oracle 11.2 it can be set to AUTO. In Oracle 12.1, its value can be set to ADAPTIVE instead. In addition to the automatic calculation of the DOP, this also enables performance feedback.
  • The parameter parallel_automatic_tuning is deprecated and should not be used anymore in Oracle 11.2 and 12.1.

The configuration of parallel processing can be influenced with some more parameter, but this would go to far for this overview. Also, in specific environments, e.g. RAC configurations or data warehouses on Exadata, there are additional configuration settings to be considered. And depending on the environment and the used Oracle versions and options, some of the recommendations can be different. For example, if you use the Oracle Database In-Memory option, you will probably prefer vector transformation instead of star transformation. But this is another story.

Although this blog post contains just a short overview of some important configuration parameters, I hope it helps you to setup your DWH databases properly and efficiently. If you need more information, the Oracle Database Reference Manual (11.2 or 12.1) is still a good and complete source to find more detailed information about each initialization parameter.

3 thoughts on “Oracle Initialization Parameters for Data Warehouses

  1. Hi Dani,

    I just came across this and, though I have only read chapter 2, I like your teaching style and will be reading it in its entirety.

    *”For the ETL processes and for most of the queries on a data mart, the throughput of the rows is more important than the response time for the first rows.”* I know this was a general statement that is, by and large, true, but I’d like to point out an exception:

    It’s 100% correct to leave the default to ALL_ROWS. However, in the case of ETL, it’s not necessarily the case that throughput is more important than response time of the initial rows.

    Pipelined table functions, which I use often, begin returning the initial rows well before all rows for the query have been returned. In fact, though they help enormously with certain queries, Oracle actually created them as a tool to speed up ETL because they allow processing of data earlier. They allow the ETL to process data more like a stream or a pipeline.

    Thank you for taking the time to write this. I look forward to reading the rest.

    Michael Milligan
    Hooper Utah


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s