Intermediate SQL Color Coded SQL, UNIX and Database Essays

18Jun/103

ORACLE 11g SQL Plan Management: The Dark Side of SPM. Part 4

With SQL Plan Management being relatively new, it is inevitable that many people run into problems using it.

Most of those problems are caused by our (relative) ignorance: SPM does change the way how ORACLE runs SQL statements and it simply takes time to get used to how it works. Yet, some of the consequences of using SPM are truly bizarre and will surprise many people (including, probably, some ORACLE developers).

In this post I will describe the 3 scenarios where we have recently run into problems using SPM and you can judge for yourself …


 

Scenario 1: The WEIRD: Another way for ORACLE NOT to use your index

This is essentially the problem that I described in the first post, so here I will provide a simplified description of what happened.

Imagine that you have a large table in your schema, say the one that is created by the statement below:

SQL> CREATE TABLE t (n, c) NOLOGGING PARALLEL PCTFREE 99 PCTUSED 1
  AS SELECT level, CAST (to_char(level) AS varchar2(10)) FROM dual
  CONNECT BY level <= 1000000
/

SQL> ALTER TABLE t LOGGING NOPARALLEL;

SQL> EXEC dbms_stats.gather_table_stats(USER, 't');

You set up implicit SPM capture and run your workload.

SQL> ALTER SESSION SET optimizer_capture_sql_plan_baselines=TRUE;

-- You are a good ORACLE programmer and, of course, you use bind variables
SQL> var n NUMBER
SQL> EXEC :n := 200;

SQL> SELECT * FROM t WHERE n=:n;

SQL> SELECT * FROM t WHERE n=:n;

After a while, all of your major SQLs are implicitly captured and SPM baselines are created for them.

SET linesize 180
colu sql_text format A30
colu optimizer_cost format 999,999 heading Cost
SELECT plan_name, sql_text, optimizer_cost, accepted
FROM dba_sql_plan_baselines
WHERE to_char(sql_text) LIKE 'SELECT * FROM t WHERE n=:n'
/

PLAN_NAME                      SQL_TEXT                           Cost ACC
------------------------------ ------------------------------ -------- ---
SQL_PLAN_4wm24mwmr8n9z94ecae5c SELECT * FROM t WHERE n=:n       68,764 YES

Since baselines are the first to be created, they are ACCEPTED and are now being used for your statements.

SQL> SET autotrace traceonly
SQL> SELECT * FROM t WHERE n=:n;

Execution Plan
--------------------------------------------------------------------------
| Id  | Operation         | Name | ROWS  | Bytes | Cost (%CPU)| TIME     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    12 | 68793   (1)| 00:13:46 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    12 | 68793   (1)| 00:13:46 |
--------------------------------------------------------------------------

Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------

   1 - FILTER("N"=TO_NUMBER(:N))

Note
-----
   - SQL plan baseline "SQL_PLAN_4wm24mwmr8n9z94ecae5c" used FOR this statement

Somewhere along the line you realize that the FULL TABLE SCAN that this SQLs is running is NOT very efficient. An INDEX might help things a lot! So, you create the index:

SQL> CREATE UNIQUE INDEX t_pk ON t (n) NOLOGGING PARALLEL;

SQL> ALTER INDEX t_pk LOGGING NOPARALLEL;

Since the index is UNIQUE and your data distribution is uniform, the above query should switch to using the index, right ? It seems obvious, but let’s find out.

SQL> SELECT * FROM t WHERE n=:n;

Execution Plan
--------------------------------------------------------------------------
| Id  | Operation         | Name | ROWS  | Bytes | Cost (%CPU)| TIME     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    12 | 68793   (1)| 00:13:46 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    12 | 68793   (1)| 00:13:46 |
--------------------------------------------------------------------------

Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------

   1 - FILTER("N"=TO_NUMBER(:N))

Note
-----
   - SQL plan baseline "SQL_PLAN_4wm24mwmr8n9z94ecae5c" used FOR this statement

Statistics
----------------------------------------------------------
        221  recursive calls
         34  db block gets
     250055  consistent gets
     249986  physical reads

Well, the reality is – it didn’t – the query is still running FULL TABLE SCAN and performing ~ 250k logical reads while at that.

What happened here ? Why is our query NOT using the index ?

In a word, what happened is – SPM. Since ACCEPTED SPM Baseline already exists for this statement, even if the statement generates a NEW execution plan (and it DOES, because it can now use the index), the new plan/baseline, although created, is NOT ACCEPTED and hence cannot be used.

Let’s see for ourselves:

SQL> SELECT plan_name, sql_text, optimizer_cost, accepted
FROM dba_sql_plan_baselines
WHERE to_char(sql_text) LIKE 'SELECT * FROM t WHERE n=:n'
/

PLAN_NAME                      SQL_TEXT                           Cost ACC
------------------------------ ------------------------------ -------- ---
SQL_PLAN_4wm24mwmr8n9z0efda8a7 SELECT * FROM t WHERE n=:n            3 NO
SQL_PLAN_4wm24mwmr8n9z94ecae5c SELECT * FROM t WHERE n=:n       68,764 YES

Notice the new plan for our statement. It comes with a much better cost and yet, it cannot be used because it is NOT ACCEPTED (yet).

I’ll repeat it again in a slightly more dramatic fashion:

If ACCEPTED SPM baseline already exists for SQL statement – IT WILL BE USED regardless of whether a new (but as of yet unverified) plan is better

Even though it feels weird, I believe this is a correct behavior. You wanted plan stability and you GOT IT (remember, this is what SPM is really about).

But still, running FULL TABLE SCANs for this statement is obviously inefficient … So, how can we make SPM accept the new (and, in our opinion, better) execution plan ?

Simple – we should prove it to ORACLE that it is better! We can do it by “evolving” the baseline (while creating baselines does not require special permissions, to “evolve” one you need to have ADMINISTER SQL MANAGEMENT OBJECT privilege).

SQL> SELECT sql_handle FROM dba_sql_plan_baselines
WHERE plan_name='SQL_PLAN_4wm24mwmr8n9z0efda8a7';

SQL_HANDLE
------------------------------
SYS_SQL_4e4c449f2774513f

SQL> SET long 1000000
SQL> SET longchunksize 180

SELECT dbms_spm.evolve_sql_plan_baseline(
  'SYS_SQL_4e4c449f2774513f', 'SQL_PLAN_4wm24mwmr8n9z0efda8a7') FROM dual;

The “evolution” process runs pretty long and when you look at ORACLE you see that both FULL TABLE SCAN (currently ACCEPTED) and INDEX UNIQUE ACCESS (REQUESTED) paths are tried.

At the end you see the following report and YES, your new plan has been tried, deemed MUCH BETTER than the original one and ACCEPTED.

Inputs:
-------
  SQL_HANDLE = SYS_SQL_4e4c449f2774513f
  PLAN_NAME  =
  TIME_LIMIT = DBMS_SPM.AUTO_LIMIT
  VERIFY     = YES
  COMMIT     = YES

Plan: SQL_PLAN_4wm24mwmr8n9z0efda8a7
------------------------------------
  Plan was verified: TIME used 107.754 seconds.
  Plan passed performance criterion: 63031 times better than baseline plan.
  Plan was changed TO an accepted plan.

                            Baseline Plan      Test Plan       Stats Ratio
                            -------------      ---------       -----------
  Execution STATUS:              COMPLETE       COMPLETE
  ROWS Processed:                       1              1
  Elapsed TIME(ms):             51584.554           .207         249200.74
  CPU TIME(ms):                     20540              0
  Buffer Gets:                     250070              4           62517.5
  Physical READ Requests:            3259              0
  Physical WRITE Requests:              0              0
  Physical READ Bytes:         1999519744              0
  Physical WRITE Bytes:                 0              0
  Executions:                           1              1

-------------------------------------------------------------------------------
                                 Report Summary
-------------------------------------------------------------------------------
NUMBER OF plans verified: 1
NUMBER OF plans accepted: 1

So, now when you run the statement again, it is using a new, much more efficient SPM Baseline:

SQL> SELECT * FROM t WHERE n=:n;

Execution Plan
------------------------------------------------------------------------------------
| Id  | Operation                   | Name | ROWS  | Bytes | Cost (%CPU)| TIME     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    12 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    12 |     3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T_PK |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------

   2 - access("N"=TO_NUMBER(:N))

Note
-----
   - SQL plan baseline "SQL_PLAN_4wm24mwmr8n9z0efda8a7" used FOR this statement

You might wonder, what happened to the original baseline ? Did it get demoted and changed its status to NOT ACCEPTED ?

Strangely, NO, both old and new baselines are now ACCEPTED (and, according to ORACLE, a more efficient baseline will be chosen in this case).

SQL> SELECT plan_name, sql_text, optimizer_cost, accepted
FROM dba_sql_plan_baselines
WHERE to_char(sql_text) LIKE 'SELECT * FROM t WHERE n=:n'
/

PLAN_NAME                      SQL_TEXT                           Cost ACC
------------------------------ ------------------------------ -------- ---
SQL_PLAN_4wm24mwmr8n9z0efda8a7 SELECT * FROM t WHERE n=:n            3 YES
SQL_PLAN_4wm24mwmr8n9z94ecae5c SELECT * FROM t WHERE n=:n       68,764 YES

Despite this little odd thing, I hope that you can see now that SPM behavior is very logical and SPM is actually working as advertised to enforce plan stability.

Scenario 2: The UGLY: How to screw up your neighbor

The previous scenario displayed things about SPM that were pretty weird, but let me tell you – what you are about to see is much MUCH weirder …

All the setup from the previous scenario remains the same, except we have dropped the index and removed all collected SPM baselines to provide the clean slate:

SQL> DROP INDEX t_pk;

DECLARE
  cursor c_s IS SELECT sql_handle FROM dba_sql_plan_baselines
  WHERE parsing_schema_name='TEST';

  nRet NUMBER;
BEGIN
  FOR rec IN c_s loop
    BEGIN
      nRet := dbms_spm.drop_sql_plan_baseline(rec.sql_handle);
    exception
      -- I know, this is BAD, so, do not use it in your production code
      WHEN others THEN NULL;
    END;
  END loop;

  commit;
END;
/

SQL> SELECT * FROM dba_sql_plan_baselines
  WHERE parsing_schema_name='TEST';

no ROWS selected

We still have a big table, now with no indexes and we would like to collect some SPM baselines for the same FULL TABLE SCAN SQL that we ran in Scenario 1.

SQL> ALTER SESSION SET optimizer_capture_sql_plan_baselines=TRUE;

SQL> var n NUMBER
SQL> EXEC :n := 200;

SQL> SELECT * FROM t WHERE n=:n;

SQL> SELECT * FROM t WHERE n=:n;

-- And, now we have our (FULL TABLE SCAN) baseline
SQL> SELECT plan_name, sql_text, optimizer_cost, accepted
FROM dba_sql_plan_baselines
WHERE to_char(sql_text) LIKE 'SELECT * FROM t WHERE n=:n'
/

PLAN_NAME                      SQL_TEXT                           Cost ACC
------------------------------ ------------------------------ -------- ---
SQL_PLAN_4wm24mwmr8n9z94ecae5c SELECT * FROM t WHERE n=:n       68,764 YES

To this point we ran all our statements under the same database user: TEST. For the purpose of this exercise, we will now create a separate database user and give it some basic database permissions:

SQL> CREATE USER test2 IDENTIFIED BY test2
  DEFAULT TABLESPACE users QUOTA unlimited ON users;

SQL> GRANT CREATE SESSION, ALTER SESSION, CREATE TABLE, SELECT any dictionary TO test2;

We now connect to TEST2 and create table T with exactly the same structure as in our original user: TEST. In addition to that, we also create a UNIQUE INDEX on column N. The table, by the way, does not need to be of the same size, so we will create it slightly smaller (but still large enough to see the point).

SQL> CREATE TABLE t (n, c) NOLOGGING PARALLEL PCTFREE 99 PCTUSED 1
  AS SELECT level, CAST (to_char(level) AS varchar2(10)) FROM dual
  CONNECT BY level <= 100000
/

SQL> ALTER TABLE t LOGGING NOPARALLEL;

SQL> EXEC dbms_stats.gather_table_stats(USER, 't');

SQL> CREATE UNIQUE INDEX t_pk ON t (n);

Now for the interesting part. Let’s run the same SQL in TEST2 that we ran in TEST. Since we have index in place, we should expect that our query should be very efficient.

SQL> SHOW USER
TEST2

SQL> SELECT * FROM t WHERE n=:n;

Execution Plan
--------------------------------------------------------------------------
| Id  | Operation         | Name | ROWS  | Bytes | Cost (%CPU)| TIME     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    11 |  6951   (1)| 00:01:24 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    11 |  6951   (1)| 00:01:24 |
--------------------------------------------------------------------------

Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------

   1 - FILTER("N"=TO_NUMBER(:N))

Note
-----
   - SQL plan baseline "SQL_PLAN_4wm24mwmr8n9z94ecae5c" used FOR this statement

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
      25004  consistent gets
       2098  physical reads

A full table scan and a baseline! All I can say here is: WOW!

ORACLE is using SPM baseline that was collected for a different schema (and obviously different object) and is applying it to this new statement.

Unbelievable!

But there is more! Let’s exchange bind variable N with bind variable M (everybody who studied mathematics in college would recall that in most cases M >> N ;-) )

SQL> var m NUMBER
SQL> EXEC :m := 200;

SQL> SELECT * FROM t WHERE n=:m;

Execution Plan
------------------------------------------------------------------------------------
| Id  | Operation                   | Name | ROWS  | Bytes | Cost (%CPU)| TIME     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    11 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    11 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T_PK |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------

   2 - access("N"=TO_NUMBER(:M))

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          1  physical reads

Again, unbelievable and confirms our worst suspicions – SPM “match” is done entirely by text of the query without reference to the specific object!

But wait, there is even more … Let’s modify our new table by adding some additional columns:

SQL> ALTER TABLE t ADD (d DATE);

-- Just so that we are not cheating,
-- let's populate this new column with some data ...
SQL> UPDATE t SET d = sysdate;
SQL> COMMIT;

And now let’s repeat our original statement:

SQL> SELECT * FROM t WHERE n=:n;

Execution Plan
--------------------------------------------------------------------------
| Id  | Operation         | Name | ROWS  | Bytes | Cost (%CPU)| TIME     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    11 |  6951   (1)| 00:01:24 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    11 |  6951   (1)| 00:01:24 |
--------------------------------------------------------------------------

Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------

   1 - FILTER("N"=TO_NUMBER(:N))

Note
-----
   - SQL plan baseline "SQL_PLAN_4wm24mwmr8n9z94ecae5c" used FOR this statement

Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
      42013  consistent gets
          0  physical reads

And, yet again, even though we have table with a different size and STRUCTURE our query is matched (simply by text) to the same “external” (that is – created for different schema) SPM baseline.

Notice that neither creating the baselines in the original schema nor “assigning” them in the target schema required any special database permissions. And thus, if nothing more, ORACLE gave us a new major tool for practical jokes in the database.

You want to screw up database performance for your neighbor ? There is nothing easier than that: Create similar objects in your own schema (remember, they do not even have to be the same size or structure), run there your neighbor’s SQLs (which are, of course, available in v$SQL view), collect baselines and voila – your neighbor’s SQLs are suffering (and she probably does not have a clue what is going on …).

I consider it a major bug as I do not see any reasonable explanation why SPM works this way. Just to repeat:

SPM Baselines created in one schema can be IMPLICITLY used by SQLs IN ANOTHER SCHEMA

The match is done entirely BY TEXT (and thus “matched” tables can have DIFFERENT STRUCTURE OR SIZE)

Scenario 3: The PRECAUTION: Don’t like binds ? Watch out for space!

One can probably argue that SPM is mostly useful for OLTP systems. Typical OLTP queries are designed for sub second response times and changes in their behavior and timing will be highlighted immediately (and therefore keeping execution plans stable is a good technique there).

One thing that everyone knows about OLTP queries is that the vast majority of them MUST use bind variables. A common wisdom (not to mention multiple horror stories) says that if binds are NOT used, the applications will likely run into serious performance and serialization problems.

And yet, many OLTP systems are still suffering from the literal SQL problem (if this was NOT true, articles about using binds would NOT be SO popular … :-) )

While literals in OLTP queries are BAD for many reasons, SPM introduced yet another dimension into this equation. Consider, for example, what would happen if implicit SPM capture is turned ON (which I would argue is generally a GOOD thing for OLTP system) and the system is generating A LOT of literal SQLs.

At the first glance, you could probably guess that nothing BAD would happen. Since:

  1. Only REPEATABLE SQLs are captured and converted into baselines
  2. Literal SQLs are UNIQUE by definition

Therefore, SPM baselines would NEVER be created for literal SQLs.

But this is a bit naive.

First of all, literal SQLs are rarely truly unique. In many systems, the number of distinct literal SQLs is limited by the number of records in the target table (which might be large, but certainly not infinite).

Second, most applications have a very specific (and rather small) “active data window”, which further encourages literal SQLs to be repeated, somewhat …

And lastly, SQLs do not need to be repeated a lot, remember that TWICE is sufficient for a baseline to be created.

As such, over time OLTP systems with many literal SQLs are likely to generate and store a LOT of SPM baselines. And the thing that I would like to focus here is: storage.

The use of SPM baselines does consume additional space as baselines record additional information about (in this case, individual) SQLs. This usage is fairly significant: baselines “for real production SQLs” created in one of our QA systems averaged ~ 4 Kb each, not including indexes (your mileage may vary, of course).

SQL> SELECT avg_row_len FROM user_tab_statistics
WHERE TABLE_NAME='STAGE1';

AVG_ROW_LEN
-----------
        3801

Hence, on a busy system, these literal baseline ‘space leaks’ may add up fast and will have to be eventually (or in some cases, rather quickly) dealt with.

The bottom line is that:

IMPLICIT CAPTURE does NOT mix very well with LITERAL SQLS

Final Words: Don’t be discouraged (but Do be warned)

Isn’t it funny that the tool that was designed to reduce nasty surprises in application performance comes loaded with (some would say even nastier) surprises of its own ?

Well, at least it is not entirely unexpected … ORACLE 11g SQL Plan Management is a new feature and as anything new it needs to go through some growing pains. Granted, some of these “pains” may look pretty ugly right now (as we’ve seen in this post), but this should NOT keep us from using it.

Or should it ? …

Let’s look at the big picture: It is fair to say that everybody hates performance surprises caused by SQL plan “mood swings”.

End users hate them because they disrupt their workflow.

Developers hate them because they make it much more difficult to design well performing applications.

DBAs hate them because it’s no fun working in the middle of the night scrambling for solutions.

And CIOs … Well, CIOs hate those developers and DBAs that cannot get performance surprises under control …

In other words, being able to provide a stable performance for database applications is not only a worthy goal by itself but it also directly affects a quality of life (not to mention job security) for many people.

And,

SQL Plan Management makes SQL performance STABLE

Yes, SPM is somewhat unorthodox and yes, it requires some extra effort to learn how it works and avoid certain pitfalls, but would I make that effort given all that was said before ?

Hell, yeah!

And I’m sure that my developers, my users and my CIO will thank me for that.

Comments (3) Trackbacks (3)
  1. Some excellent points raised here, presented with clarity.

    I hadn’t even considered Scenario 2 but based on the signature match I suppose it’s obvious. Although I imagine that’s a loophole that will get closed somehow in the not too distant future

  2. Nice article. However I have seen most of the large OLTP database system never goes for all these automatic tuning features provided by latest version including automatic memory management.
    Intact we never gather stats on our OLTP system to maintain plan stability & and performance.

  3. Thanks Amit. I agree: large OLTP systems should almost never use automatic tuning features. However, SQL plan management is not an automatic tuning feature (unless you want it to be), it is rather the tool that prevents automated management from kicking in by “freezing” SQL plans in place. We use it extensively to make sure plans are managed by humans … but as with any tool – you need to be aware of all the little quirks that it has to prevent weird problems (and yes, some of them are pretty weird)


Leave a comment

(required)