ORACLE 11g SQL Plan Management: The Dark Side of SPM. Part 4
With SQL Plan Management being relatively new, it is inevitable that many people run into problems using it.
Most of those problems are caused by our (relative) ignorance: SPM does change the way how ORACLE runs SQL statements and it simply takes time to get used to how it works. Yet, some of the consequences of using SPM are truly bizarre and will surprise many people (including, probably, some ORACLE developers).
In this post I will describe the 3 scenarios where we have recently run into problems using SPM and you can judge for yourself …
Contents
Scenario 1: The WEIRD: Another way for ORACLE NOT to use your index
This is essentially the problem that I described in the first post, so here I will provide a simplified description of what happened.
Imagine that you have a large table in your schema, say the one that is created by the statement below:
AS SELECT level, CAST (to_char(level) AS varchar2(10)) FROM dual
CONNECT BY level <= 1000000
/
SQL> ALTER TABLE t LOGGING NOPARALLEL;
SQL> EXEC dbms_stats.gather_table_stats(USER, 't');
You set up implicit SPM capture and run your workload.
-- You are a good ORACLE programmer and, of course, you use bind variables
SQL> var n NUMBER
SQL> EXEC :n := 200;
SQL> SELECT * FROM t WHERE n=:n;
SQL> SELECT * FROM t WHERE n=:n;
After a while, all of your major SQLs are implicitly captured and SPM baselines are created for them.
colu sql_text format A30
colu optimizer_cost format 999,999 heading Cost
SELECT plan_name, sql_text, optimizer_cost, accepted
FROM dba_sql_plan_baselines
WHERE to_char(sql_text) LIKE 'SELECT * FROM t WHERE n=:n'
/
PLAN_NAME SQL_TEXT Cost ACC
------------------------------ ------------------------------ -------- ---
SQL_PLAN_4wm24mwmr8n9z94ecae5c SELECT * FROM t WHERE n=:n 68,764 YES
Since baselines are the first to be created, they are ACCEPTED and are now being used for your statements.
SQL> SELECT * FROM t WHERE n=:n;
Execution Plan
--------------------------------------------------------------------------
| Id | Operation | Name | ROWS | Bytes | Cost (%CPU)| TIME |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 68793 (1)| 00:13:46 |
|* 1 | TABLE ACCESS FULL| T | 1 | 12 | 68793 (1)| 00:13:46 |
--------------------------------------------------------------------------
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
1 - FILTER("N"=TO_NUMBER(:N))
Note
-----
- SQL plan baseline "SQL_PLAN_4wm24mwmr8n9z94ecae5c" used FOR this statement
Somewhere along the line you realize that the FULL TABLE SCAN that this SQLs is running is NOT very efficient. An INDEX might help things a lot! So, you create the index:
SQL> ALTER INDEX t_pk LOGGING NOPARALLEL;
Since the index is UNIQUE and your data distribution is uniform, the above query should switch to using the index, right ? It seems obvious, but let’s find out.
Execution Plan
--------------------------------------------------------------------------
| Id | Operation | Name | ROWS | Bytes | Cost (%CPU)| TIME |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 68793 (1)| 00:13:46 |
|* 1 | TABLE ACCESS FULL| T | 1 | 12 | 68793 (1)| 00:13:46 |
--------------------------------------------------------------------------
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
1 - FILTER("N"=TO_NUMBER(:N))
Note
-----
- SQL plan baseline "SQL_PLAN_4wm24mwmr8n9z94ecae5c" used FOR this statement
Statistics
----------------------------------------------------------
221 recursive calls
34 db block gets
250055 consistent gets
249986 physical reads
Well, the reality is – it didn’t – the query is still running FULL TABLE SCAN and performing ~ 250k logical reads while at that.
What happened here ? Why is our query NOT using the index ?
In a word, what happened is – SPM. Since ACCEPTED SPM Baseline already exists for this statement, even if the statement generates a NEW execution plan (and it DOES, because it can now use the index), the new plan/baseline, although created, is NOT ACCEPTED and hence cannot be used.
Let’s see for ourselves:
FROM dba_sql_plan_baselines
WHERE to_char(sql_text) LIKE 'SELECT * FROM t WHERE n=:n'
/
PLAN_NAME SQL_TEXT Cost ACC
------------------------------ ------------------------------ -------- ---
SQL_PLAN_4wm24mwmr8n9z0efda8a7 SELECT * FROM t WHERE n=:n 3 NO
SQL_PLAN_4wm24mwmr8n9z94ecae5c SELECT * FROM t WHERE n=:n 68,764 YES
Notice the new plan for our statement. It comes with a much better cost and yet, it cannot be used because it is NOT ACCEPTED (yet).
I’ll repeat it again in a slightly more dramatic fashion:
If ACCEPTED SPM baseline already exists for SQL statement – IT WILL BE USED regardless of whether a new (but as of yet unverified) plan is better
Even though it feels weird, I believe this is a correct behavior. You wanted plan stability and you GOT IT (remember, this is what SPM is really about).
But still, running FULL TABLE SCANs for this statement is obviously inefficient … So, how can we make SPM accept the new (and, in our opinion, better) execution plan ?
Simple – we should prove it to ORACLE that it is better! We can do it by “evolving” the baseline (while creating baselines does not require special permissions, to “evolve” one you need to have ADMINISTER SQL MANAGEMENT OBJECT privilege).
WHERE plan_name='SQL_PLAN_4wm24mwmr8n9z0efda8a7';
SQL_HANDLE
------------------------------
SYS_SQL_4e4c449f2774513f
SQL> SET long 1000000
SQL> SET longchunksize 180
SELECT dbms_spm.evolve_sql_plan_baseline(
'SYS_SQL_4e4c449f2774513f', 'SQL_PLAN_4wm24mwmr8n9z0efda8a7') FROM dual;
The “evolution” process runs pretty long and when you look at ORACLE you see that both FULL TABLE SCAN (currently ACCEPTED) and INDEX UNIQUE ACCESS (REQUESTED) paths are tried.
At the end you see the following report and YES, your new plan has been tried, deemed MUCH BETTER than the original one and ACCEPTED.
-------
SQL_HANDLE = SYS_SQL_4e4c449f2774513f
PLAN_NAME =
TIME_LIMIT = DBMS_SPM.AUTO_LIMIT
VERIFY = YES
COMMIT = YES
Plan: SQL_PLAN_4wm24mwmr8n9z0efda8a7
------------------------------------
Plan was verified: TIME used 107.754 seconds.
Plan passed performance criterion: 63031 times better than baseline plan.
Plan was changed TO an accepted plan.
Baseline Plan Test Plan Stats Ratio
------------- --------- -----------
Execution STATUS: COMPLETE COMPLETE
ROWS Processed: 1 1
Elapsed TIME(ms): 51584.554 .207 249200.74
CPU TIME(ms): 20540 0
Buffer Gets: 250070 4 62517.5
Physical READ Requests: 3259 0
Physical WRITE Requests: 0 0
Physical READ Bytes: 1999519744 0
Physical WRITE Bytes: 0 0
Executions: 1 1
-------------------------------------------------------------------------------
Report Summary
-------------------------------------------------------------------------------
NUMBER OF plans verified: 1
NUMBER OF plans accepted: 1
So, now when you run the statement again, it is using a new, much more efficient SPM Baseline:
Execution Plan
------------------------------------------------------------------------------------
| Id | Operation | Name | ROWS | Bytes | Cost (%CPU)| TIME |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 12 | 3 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | T_PK | 1 | | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
2 - access("N"=TO_NUMBER(:N))
Note
-----
- SQL plan baseline "SQL_PLAN_4wm24mwmr8n9z0efda8a7" used FOR this statement
You might wonder, what happened to the original baseline ? Did it get demoted and changed its status to NOT ACCEPTED ?
Strangely, NO, both old and new baselines are now ACCEPTED (and, according to ORACLE, a more efficient baseline will be chosen in this case).
FROM dba_sql_plan_baselines
WHERE to_char(sql_text) LIKE 'SELECT * FROM t WHERE n=:n'
/
PLAN_NAME SQL_TEXT Cost ACC
------------------------------ ------------------------------ -------- ---
SQL_PLAN_4wm24mwmr8n9z0efda8a7 SELECT * FROM t WHERE n=:n 3 YES
SQL_PLAN_4wm24mwmr8n9z94ecae5c SELECT * FROM t WHERE n=:n 68,764 YES
Despite this little odd thing, I hope that you can see now that SPM behavior is very logical and SPM is actually working as advertised to enforce plan stability.
Scenario 2: The UGLY: How to screw up your neighbor
The previous scenario displayed things about SPM that were pretty weird, but let me tell you – what you are about to see is much MUCH weirder …
All the setup from the previous scenario remains the same, except we have dropped the index and removed all collected SPM baselines to provide the clean slate:
DECLARE
cursor c_s IS SELECT sql_handle FROM dba_sql_plan_baselines
WHERE parsing_schema_name='TEST';
nRet NUMBER;
BEGIN
FOR rec IN c_s loop
BEGIN
nRet := dbms_spm.drop_sql_plan_baseline(rec.sql_handle);
exception
-- I know, this is BAD, so, do not use it in your production code
WHEN others THEN NULL;
END;
END loop;
commit;
END;
/
SQL> SELECT * FROM dba_sql_plan_baselines
WHERE parsing_schema_name='TEST';
no ROWS selected
We still have a big table, now with no indexes and we would like to collect some SPM baselines for the same FULL TABLE SCAN SQL that we ran in Scenario 1.
SQL> var n NUMBER
SQL> EXEC :n := 200;
SQL> SELECT * FROM t WHERE n=:n;
SQL> SELECT * FROM t WHERE n=:n;
-- And, now we have our (FULL TABLE SCAN) baseline
SQL> SELECT plan_name, sql_text, optimizer_cost, accepted
FROM dba_sql_plan_baselines
WHERE to_char(sql_text) LIKE 'SELECT * FROM t WHERE n=:n'
/
PLAN_NAME SQL_TEXT Cost ACC
------------------------------ ------------------------------ -------- ---
SQL_PLAN_4wm24mwmr8n9z94ecae5c SELECT * FROM t WHERE n=:n 68,764 YES
To this point we ran all our statements under the same database user: TEST. For the purpose of this exercise, we will now create a separate database user and give it some basic database permissions:
DEFAULT TABLESPACE users QUOTA unlimited ON users;
SQL> GRANT CREATE SESSION, ALTER SESSION, CREATE TABLE, SELECT any dictionary TO test2;
We now connect to TEST2 and create table T with exactly the same structure as in our original user: TEST. In addition to that, we also create a UNIQUE INDEX on column N. The table, by the way, does not need to be of the same size, so we will create it slightly smaller (but still large enough to see the point).
AS SELECT level, CAST (to_char(level) AS varchar2(10)) FROM dual
CONNECT BY level <= 100000
/
SQL> ALTER TABLE t LOGGING NOPARALLEL;
SQL> EXEC dbms_stats.gather_table_stats(USER, 't');
SQL> CREATE UNIQUE INDEX t_pk ON t (n);
Now for the interesting part. Let’s run the same SQL in TEST2 that we ran in TEST. Since we have index in place, we should expect that our query should be very efficient.
TEST2
SQL> SELECT * FROM t WHERE n=:n;
Execution Plan
--------------------------------------------------------------------------
| Id | Operation | Name | ROWS | Bytes | Cost (%CPU)| TIME |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 6951 (1)| 00:01:24 |
|* 1 | TABLE ACCESS FULL| T | 1 | 11 | 6951 (1)| 00:01:24 |
--------------------------------------------------------------------------
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
1 - FILTER("N"=TO_NUMBER(:N))
Note
-----
- SQL plan baseline "SQL_PLAN_4wm24mwmr8n9z94ecae5c" used FOR this statement
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
25004 consistent gets
2098 physical reads
A full table scan and a baseline! All I can say here is: WOW!
ORACLE is using SPM baseline that was collected for a different schema (and obviously different object) and is applying it to this new statement.
Unbelievable!
But there is more! Let’s exchange bind variable N with bind variable M (everybody who studied mathematics in college would recall that in most cases M >> N
)
SQL> EXEC :m := 200;
SQL> SELECT * FROM t WHERE n=:m;
Execution Plan
------------------------------------------------------------------------------------
| Id | Operation | Name | ROWS | Bytes | Cost (%CPU)| TIME |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 11 | 2 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | T_PK | 1 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
2 - access("N"=TO_NUMBER(:M))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3 consistent gets
1 physical reads
Again, unbelievable and confirms our worst suspicions – SPM “match” is done entirely by text of the query without reference to the specific object!
But wait, there is even more … Let’s modify our new table by adding some additional columns:
-- Just so that we are not cheating,
-- let's populate this new column with some data ...
SQL> UPDATE t SET d = sysdate;
SQL> COMMIT;
And now let’s repeat our original statement:
Execution Plan
--------------------------------------------------------------------------
| Id | Operation | Name | ROWS | Bytes | Cost (%CPU)| TIME |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 11 | 6951 (1)| 00:01:24 |
|* 1 | TABLE ACCESS FULL| T | 1 | 11 | 6951 (1)| 00:01:24 |
--------------------------------------------------------------------------
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
1 - FILTER("N"=TO_NUMBER(:N))
Note
-----
- SQL plan baseline "SQL_PLAN_4wm24mwmr8n9z94ecae5c" used FOR this statement
Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
42013 consistent gets
0 physical reads
And, yet again, even though we have table with a different size and STRUCTURE our query is matched (simply by text) to the same “external” (that is – created for different schema) SPM baseline.
Notice that neither creating the baselines in the original schema nor “assigning” them in the target schema required any special database permissions. And thus, if nothing more, ORACLE gave us a new major tool for practical jokes in the database.
You want to screw up database performance for your neighbor ? There is nothing easier than that: Create similar objects in your own schema (remember, they do not even have to be the same size or structure), run there your neighbor’s SQLs (which are, of course, available in v$SQL view), collect baselines and voila – your neighbor’s SQLs are suffering (and she probably does not have a clue what is going on …).
I consider it a major bug as I do not see any reasonable explanation why SPM works this way. Just to repeat:
SPM Baselines created in one schema can be IMPLICITLY used by SQLs IN ANOTHER SCHEMA
The match is done entirely BY TEXT (and thus “matched” tables can have DIFFERENT STRUCTURE OR SIZE)
Scenario 3: The PRECAUTION: Don’t like binds ? Watch out for space!
One can probably argue that SPM is mostly useful for OLTP systems. Typical OLTP queries are designed for sub second response times and changes in their behavior and timing will be highlighted immediately (and therefore keeping execution plans stable is a good technique there).
One thing that everyone knows about OLTP queries is that the vast majority of them MUST use bind variables. A common wisdom (not to mention multiple horror stories) says that if binds are NOT used, the applications will likely run into serious performance and serialization problems.
And yet, many OLTP systems are still suffering from the literal SQL problem (if this was NOT true, articles about using binds would NOT be SO popular …
)
While literals in OLTP queries are BAD for many reasons, SPM introduced yet another dimension into this equation. Consider, for example, what would happen if implicit SPM capture is turned ON (which I would argue is generally a GOOD thing for OLTP system) and the system is generating A LOT of literal SQLs.
At the first glance, you could probably guess that nothing BAD would happen. Since:
- Only REPEATABLE SQLs are captured and converted into baselines
- Literal SQLs are UNIQUE by definition
Therefore, SPM baselines would NEVER be created for literal SQLs.
But this is a bit naive.
First of all, literal SQLs are rarely truly unique. In many systems, the number of distinct literal SQLs is limited by the number of records in the target table (which might be large, but certainly not infinite).
Second, most applications have a very specific (and rather small) “active data window”, which further encourages literal SQLs to be repeated, somewhat …
And lastly, SQLs do not need to be repeated a lot, remember that TWICE is sufficient for a baseline to be created.
As such, over time OLTP systems with many literal SQLs are likely to generate and store a LOT of SPM baselines. And the thing that I would like to focus here is: storage.
The use of SPM baselines does consume additional space as baselines record additional information about (in this case, individual) SQLs. This usage is fairly significant: baselines “for real production SQLs” created in one of our QA systems averaged ~ 4 Kb each, not including indexes (your mileage may vary, of course).
WHERE TABLE_NAME='STAGE1';
AVG_ROW_LEN
-----------
3801
Hence, on a busy system, these literal baseline ‘space leaks’ may add up fast and will have to be eventually (or in some cases, rather quickly) dealt with.
The bottom line is that:
IMPLICIT CAPTURE does NOT mix very well with LITERAL SQLS
Final Words: Don’t be discouraged (but Do be warned)
Isn’t it funny that the tool that was designed to reduce nasty surprises in application performance comes loaded with (some would say even nastier) surprises of its own ?
Well, at least it is not entirely unexpected … ORACLE 11g SQL Plan Management is a new feature and as anything new it needs to go through some growing pains. Granted, some of these “pains” may look pretty ugly right now (as we’ve seen in this post), but this should NOT keep us from using it.
Or should it ? …
Let’s look at the big picture: It is fair to say that everybody hates performance surprises caused by SQL plan “mood swings”.
End users hate them because they disrupt their workflow.
Developers hate them because they make it much more difficult to design well performing applications.
DBAs hate them because it’s no fun working in the middle of the night scrambling for solutions.
And CIOs … Well, CIOs hate those developers and DBAs that cannot get performance surprises under control …
In other words, being able to provide a stable performance for database applications is not only a worthy goal by itself but it also directly affects a quality of life (not to mention job security) for many people.
And,
SQL Plan Management makes SQL performance STABLE
Yes, SPM is somewhat unorthodox and yes, it requires some extra effort to learn how it works and avoid certain pitfalls, but would I make that effort given all that was said before ?
Hell, yeah! …
And I’m sure that my developers, my users and my CIO will thank me for that.
July 28th, 2011 - 06:25
Some excellent points raised here, presented with clarity.
I hadn’t even considered Scenario 2 but based on the signature match I suppose it’s obvious. Although I imagine that’s a loophole that will get closed somehow in the not too distant future
May 12th, 2012 - 15:40
Nice article. However I have seen most of the large OLTP database system never goes for all these automatic tuning features provided by latest version including automatic memory management.
Intact we never gather stats on our OLTP system to maintain plan stability & and performance.
May 13th, 2012 - 13:09
Thanks Amit. I agree: large OLTP systems should almost never use automatic tuning features. However, SQL plan management is not an automatic tuning feature (unless you want it to be), it is rather the tool that prevents automated management from kicking in by “freezing” SQL plans in place. We use it extensively to make sure plans are managed by humans … but as with any tool – you need to be aware of all the little quirks that it has to prevent weird problems (and yes, some of them are pretty weird)