Complex SQL statements are hard to understand – not only for us, but for the query optimizer, too. Sometimes, it is a good idea to write a simpler SQL query to do the same work. Here a nice story about a performance improvement by rewriting the SQL statement.

In my job as consultant, I often see complex SQL queries with nested inline views, subqueries, function calls, complex expressions, and much more. Of course, SQL is a powerful query language with many features and functions. The good thing is that it allows to write complex data requirements in one query. The bad thing is that such queries are often hard to understand – and hard to estimate by the query optimizer.

An approach that I often use to improve the query performance is to reduce complexity by rewriting the SQL statement, or by splitting into several pieces, e.g. with subquery factoring clauses (see blog post 10 Tips to Improve ETL Performance). Of course, this is not always possible, but it may be more efficient and sustainable than to add hints, create SQL profiles, or create additional indexes.

A Current Example

A few days ago, I got an e-mail from a customer:

“We have a problem with the following SQL query. Is it possible to improve it?”

Attached was a SQL statement and the screenshot of an execution plan. The SQL query was a hierarchical query with CONNECT BY on an inline view with some analytical functions. The following statement is more or less what I received from the customer, except that I renamed the table and column names to “anonymize” the SQL code:

SELECT (CASE

WHEN v.code = ‘ABC’ THEN 1

WHEN v.code = ‘DEF’ THEN 2

WHEN v.code = ‘GHI’ THEN 3

WHEN v.code = ‘YYZ’ THEN 4

ELSE 5

END) AS code,

v.fk_column AS fk_column,

v.fk_list AS fk_list,

v.grp_id AS grp_id

FROM (SELECT S1.grp_id,

S1.code,

MIN (S1.fk_column) AS fk_column,

LTRIM (MAX (SYS_CONNECT_BY_PATH (S1.fk_column, ‘;’))

KEEP (DENSE_RANK LAST ORDER BY S1.curr),

‘;’) AS fk_list

FROM (SELECT grp_id,

code,

fk_column,

ROW_NUMBER ()

OVER (PARTITION BY grp_id, code

ORDER BY fk_column) AS curr,

ROW_NUMBER ()

OVER (PARTITION BY grp_id, code

ORDER BY fk_column) – 1 AS prev

FROM quite_a_big_table

WHERE grp_id > 0) S1

GROUP BY S1.grp_id, S1.code

CONNECT BY S1.prev = PRIOR S1.curr

AND S1.grp_id = PRIOR S1.grp_id

START WITH curr = 1) v

ORDER BY grp_id, code

The execution plan didn’t look too bad – as far as I can judge without knowing the application and without access to the database. It contained two full table scans on table QUITE_A_BIG_TABLE and a hash join to do a self-join on the table. The table is quite big (as the name says in this example) and has more than 150 million rows. In the OEM screenshot of the execution plan, I saw that the statement was running for more than half an hour, but did not finish yet.

So, my first answer to the customer was:

“I’m sure it is possible to improve the statement, but I don’t have a fast solution for your problem because I have no remote access to the database. I can come to your office next week if you need some assistance”.

After replying the e-mail, I was not happy with my answer and wanted to understand what this SQL query does. It took me some time to find out that the purpose of the statement is, and after rebuilding the same query on the good old SCOTT.EMP table, I realized that the statement is used to extract a semicolon-separated list of foreign keys. Do I really need a CONNECT BY clause for this? I don’t think so! With the LISTAGG function it is much easier to get the same result:

SELECT (CASE

WHEN v.code = ‘ABC’ THEN 1

WHEN v.code = ‘DEF’ THEN 2

WHEN v.code = ‘GHI’ THEN 3

WHEN v.code = ‘YYZ’ THEN 4

ELSE 5

END) AS code,

v.fk_column AS fk_column,

v.fk_list AS fk_list,

v.grp_id AS grp_id

FROM (SELECT grp_id,

code,

MIN (fk_column) AS fk_column,

LISTAGG(fk_column, ‘;’)

WITHIN GROUP (ORDER BY fk_column) AS fk_list

FROM quite_a_big_table

WHERE grp_id > 0

GROUP BY grp_id, code) v

ORDER BY grp_id, code

Of course, I could not do any performance tests with my example query on the small EMP table. But because the table has to be scanned only once and no self-join is required, the same statement on QUITE_A_BIG_TABLE should definitely be faster. I rewrote the original query and sent a second e-mail to my customer:

I cannot guarantee that my query is executable and actually returns the same results. I adapted your example to the table SCOTT.EMP, then rewrote it, and finally replaced the columns with the original attribute names. I don’t know if I succeeded without errors. But it would be worth a try to execute the query.

The answer I got a few day later from the customer showed me that the rewritten SQL statement was successful. The e-mail contained many smileys and exclamation marks:

We tested your statement and it’s not only much the hell faster but also returns the same results 😉
It already ran on production and we are happy!!!!! 😊 😊

Why do I write this story here? One reason is that I like this kind of customer feedback, and it makes me happy as well. Another reason is that this example shows once more how important it is to know the different options to write a SQL statement. Often, there are many ways leading to the goal, but not all of them are simple and efficient.

Conclusion

It would be a hard rule to say: The simpler a SQL statement is, the faster it runs. But as I can see in many projects and performance tuning sessions, it is often the case that a SQL statement that can be simplified or written in a more elegant way is easier to optimize than a complex query. So, I recommend a special form of the KISS approach (“keep it simple and stupid”): Keep your SQL simple and fast.

Data Warehousing with Oracle

Dani Schnider's Blog

Keep Your SQL Simple (and Fast)

A Current Example

Conclusion

Leave a comment Cancel reply

A Current Example

Conclusion

Share this:

Related

Leave a comment Cancel reply