Knowledge increases exponentially. Today, you probably own more books than great universities of times past—Cambridge University owned less than two hundred books in the fifteenth century. First came the invention of writing, then alphabets, then paper, then the printing press, then mechanization. Each step caused an exponential increase in the collective human knowledge. In our generation, Al Gore invented the internet and the last barriers to the spread of knowledge have been broken. Today, everybody has the ability to contribute, communicate, and collaborate. We are all caught up in a tsunami, an avalanche, a conflagration, a veritable explosion of knowledge for the betterment of humankind. This is the blog of the good folks at Database Specialists, a brave band of Oracle database administrators from the great state of California. We bid you greeting, traveler. We hope you find something of value on these pages and we wish you good fortune in your journey.

When Two Joins Are Better Than One

I was asked to tune a “Top N” query; a simplified version is shown below. Table t1 is first filtered by group (WHERE group_id = :group_id) then partitioned by subgroup (PARTITION BY subgroup_id) and sorted by timestamp (ORDER BY timestamp DESC). The record with the most recent timestamp in each partition is the one required (WHERE rank = 1).

SELECT
  t1.*,
  t2.*
FROM
  (
    SELECT
      t1.*,
      RANK () OVER (PARTITION BY subgroup_id ORDER BY timestamp DESC) as rank
    FROM t1
    WHERE group_id = :group_id
  ) t1
  LEFT JOIN t2 ON t2.id = t1.t2_id
WHERE rank = 1;

The query plan shows that most of the time is spent retrieving 192,000 records from table t1 using index lookups (TABLE ACCESS BY INDEX ROWID). However, only 479 of those records are retained after partitioning and sorting.

----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name   | Starts | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN OUTER                |        |      1 |    479 |00:00:16.66 |   16051 |  16033 |
|*  2 |   VIEW                          |        |      1 |    479 |00:00:16.33 |   15424 |  15413 |
|*  3 |    WINDOW SORT PUSHED RANK      |        |      1 |    192K|00:00:16.06 |   15424 |  15413 |
|*  4 |     FILTER                      |        |      1 |    192K|00:00:14.23 |   15424 |  15413 |
|   5 |      TABLE ACCESS BY INDEX ROWID| T1     |      1 |    192K|00:00:14.03 |   15424 |  15413 |
|*  6 |       INDEX RANGE SCAN          | T1_IX1 |      1 |    192K|00:00:04.80 |    1021 |   1021 |
|   7 |   TABLE ACCESS FULL             | T2     |      1 |  27142 |00:00:00.26 |     627 |    620 |
----------------------------------------------------------------------------------------------------

The solution was to create an index t1_ix2 on the triplet (group_id ASC, subgroup_id ASC, timestamp DESC). Note that the timestamp values are stored in descending order in this index instead of the default ascending order. An extra join was then introduced at the outset of the query. It used the new index to collect only the ROWIDs of the 479 records but not any data from table t1. As a bonus, sorting was no longer required (WINDOW NOSORT) for the “Top N” calculation since the new index was sorted in the required order. In the next join, we retrieved the required 479 records from table t1 using the ROWIDs collected in the first join (TABLE ACCESS BY USER ROWID). We did have to use several SQL hints (LEADING, USE_NL, USE_HASH, and NO_SWAP_JOIN_INPUTS) to make the optimizer see things our way but the modified query was five times faster than the original (3.11 seconds v/s 16.6 seconds).

SELECT
  /*+ LEADING(temp t1 t2) USE_NL(t1) USE_HASH(t2) NO_SWAP_JOIN_INPUTS(t2) */
  t1.*,
  t2.*
FROM
  (
    SELECT
      /*+ NO_MERGE */
      ROWID AS t1_rowid,
      RANK () OVER (PARTITION BY subgroup_id ORDER BY timestamp DESC) as rank
    FROM t1
    WHERE group_id = :group_id
  ) temp
  LEFT JOIN t1 ON (t1.ROWID = temp.t1_rowid),
  LEFT JOIN t2 ON (t2.id = t1.t2_id)
WHERE temp.rank = 1;
----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name   | Starts | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------
|*  1 |  HASH JOIN OUTER                |        |      1 |    479 |00:00:03.11 |    2129 |   1802 |
|   2 |   NESTED LOOPS OUTER            |        |      1 |    479 |00:00:02.82 |    1502 |   1182 |
|*  3 |    VIEW                         |        |      1 |    479 |00:00:02.37 |    1023 |   1016 |
|*  4 |     WINDOW NOSORT               |        |      1 |    192K|00:00:02.14 |    1023 |   1016 |
|*  5 |      INDEX RANGE SCAN           | T1_IX2 |      1 |    192K|00:00:01.89 |    1023 |   1016 |
|   6 |    TABLE ACCESS BY USER ROWID   | T1     |    479 |    479 |00:00:00.45 |     479 |    166 |
|   7 |   TABLE ACCESS FULL             | T2     |      1 |  27142 |00:00:00.23 |     627 |    620 |
----------------------------------------------------------------------------------------------------

P.S. I first saw the ROWID-to-ROWID trick used in SQL Tuning by Dan Tow.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>