Read inconsistent queries in ORACLE database ? Sure, why not!
Read consistency is one of the coolest features of ORACLE database.
In a nutshell, it means that your queries (at least, logically) do not pay attention to the outside world. I.e. your tables may be hit with thousands of changes per second, but your query will ignore all the hoopla and will always see only the data that existed when the query started.
Read consistency is quite addictive, because in ORACLE it is so trivial. Once you figure out that you don’t need to do much to get it (i.e. deal with locks or synchronize with other users), running queries and expecting consistent results every time quickly becomes a second nature.
And … it’s quite easy to to fall into a trap of thinking that working any other way for a query is impossible.
My job today is to show that this expectation is not entirely correct
What does it mean to be read consistent
So, what exactly does it mean for a query to be read consistent ?
I’m going to explore this point using a slightly convoluted (and specially crafted) example, that will query this simple table:
INSERT INTO t VALUES (1);
commit;
SELECT * FROM t;
ORDER_ID
----------
1
With this slightly complicated query:
SELECT level l, sleep(1) s FROM dual CONNECT BY level <= 5
) SELECT /*+ ordered no_merge use_nl(t) */ int_g.l, int_g.s, t.order_id
FROM int_g, t
/
L S ORDER_ID
---------- ---------- ----------
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
Because I’m using nested loops the query above implements this fairly simple algorithm during its execution:
SELECT ...
sleep(1)
sleep(1) function here is a simple wrap around dbms_lock.sleep:
RETURN NUMBER AS
BEGIN
sys.dbms_lock.sleep(seconds);
RETURN seconds;
END;
/
that causes the query to sleep for 1 second.
In other words, I’m probing inner table: T 5 times with 1 second delay between each try.
To make matters interesting, once the above query starts, I’m going to also start changing data in T from a separate session with this simple script:
FOR i IN 1 .. 10 loop
INSERT INTO t (order_id) VALUES (i);
commit;
dbms_lock.sleep(1);
END loop;
END;
/
The bottom line is: each probe in the query should deal with a slightly different data in the table and we can now see how ORACLE reacts to that.
As most people would expect, the result of the query, even with concurrent updates still looks like this:
---------- ---------- ----------
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
In other words, the original query did NOT see any data changes coming from session 2. We queried table T 5 times over 5 seconds interval. Every time the data in T was slightly different, but our query always saw the same image, as it existed when query started. This is how read consistency rules the day.
How to be read inconsistent
Still, if this is what you’ve come to expect from ALL ORACLE queries, prepare to be surprised.
Let’s slightly modify our query, replacing our regular table with one of ORACLE dictionary views – v$session. In a nutshell, we’ll query our session status as we move along in our query.
SELECT level l, sleep(1) s FROM dual
CONNECT BY level <= 5
)
SELECT /*+ ordered no_merge use_nl(se) */
se.event, se.wait_time_micro, int_g.s
FROM int_g, v$session se
WHERE sid=sys_context('userenv', 'sid')
/
EVENT WAIT_TIME_MICRO S
-------------------- --------------- ---
PL/SQL LOCK timer 1,000,760 1
PL/SQL LOCK timer 1,000,610 1
PL/SQL LOCK timer 1,000,765 1
PL/SQL LOCK timer 1,000,755 1
PL/SQL LOCK timer 1,000,688 1
Surprise! Look at the “wait times” – all of them are different.
We still query underlying table (v$session in this case) 5 times in a nested loop, but now every single time we look at the table we see different results! It’s almost like read consistency is not in the picture here.
Well, the truth is:
When it comes to v$ views read consistency is REALLY not in the picture
v$ views are mostly based on x$ objects, which, although look like tables to us and are even called (“fixed”) tables are something else entirely: x$ objects are kernel memory arrays that are exposed to us (for our convenience) through a table interface. And being arrays, they do NOT have read consistency!
To prove the point even further I’m going to slightly modify this example by changing order of tables in the FROM clause:
With SQL being declarative language, this shouldn’t affect the results, right ? Well, see for yourself:
SELECT level l, sleep(1) s FROM dual
CONNECT BY level <= 5
)
SELECT /*+ ordered no_merge use_nl(se) */
se.event, se.wait_time_micro, int_g.s
FROM v$session se, int_g -- <-- ORDER IS REVERSED
WHERE sid=sys_context('userenv', 'sid')
/
EVENT WAIT_TIME_MICRO S
------------------------- --------------- ---
SQL*Net message TO client 3 1
SQL*Net message TO client 3 1
SQL*Net message TO client 3 1
SQL*Net message TO client 3 1
SQL*Net message TO client 3 1
The picture is completely different! Since we start with v$session this time, we read it only once and, of course, that means that wait times are the same everywhere (I guess, everything becomes consistent if you have one sample
)
And there you have it folks! Read consistency is NOT a natural property of any ORACLE query. It requires work to enforce and sometimes ORACLE chooses not to do it.
This is cool, but how we make it useful ?
In the next article, I’ll talk about precisely that.
September 17th, 2012 - 18:04
What kind of color-coding tool that you use in SQL*PLUS?
Thanks
September 17th, 2012 - 19:11
It’s not really sqlplus coloring. What you see is CodeColorer wordpress plugin.
October 11th, 2012 - 03:33
Nice writeup. Much more entertaining than the second bullet point in the note here
http://docs.oracle.com/cd/E11882_01/server.112/e25513/dynviews_1001.htm#sthref3086
“Because the information in the V$ views is dynamic, read consistency is not guaranteed for SELECT operations on these views.”
By the way, have you noticed that some functions are “read consistent” while others are not? Just add a SYSDATE column to your query and then wrap the SYSDATE call in a PL/SQL function to see the difference.
http://jonathanlewis.wordpress.com/2010/01/07/testing-testing/#comment-35233
October 12th, 2012 - 02:35
Thanks, Flado!
I did not know about systimestamp/f(systimestamp) trick. Now I have another tool in my arsenal.
Much appreciated,
Maxym Kharchenko
November 10th, 2012 - 02:09
Can you please convert your wordpress blog rss to FULL TEXT RSS , but not summary? I’d like to subscribe your blog, and access it frequently !
November 10th, 2012 - 08:24
As soon as I figure out how to do it. This seems to be a fairly recent problem and poking around in WP options, did not solve it (even though rss settings are set at “full text”) – must be a bug somewhere in php code …
January 28th, 2013 - 00:21
Tanel Poder’s latchprofx based on this feature. Simple example:
[sourcecode language="sql"]
with gen as (select/*+ inline no_merge */ level l from dual connect by level<=1000)
,timer1 as (select hsecs from v$timer)
,timer2 as (select hsecs from v$timer)
select–+ leading(timer1 g1 g2 timer2) use_merge(timer1) use_nl(timer2)
min(timer1.HSECS),max(timer2.HSECS)
from timer1
,gen g1
,gen g2
,timer2
[/sourcecode]