Intermediate SQL Color Coded SQL, UNIX and Database Essays

24Jan/100

How ORACLE Implicit Type Conversion Works. Part 1

It is said that when you hear the words this is common sense or everybody knows that you should take notice because when enough people say it – it is likely that the results are exactly the opposite. Especially, when you are dealing with ORACLE 🙂

Here is a short puzzle to prove the point.

Imagine that you have a table with ID column that stores, say, 100000 unique numbers from 1 to 100000. The column has a unique constraint (and a unique underlying index to enforce it).

Assume also that the hardware is regular (we are not yet on Exadata), table is large enough, all ORACLE metadata is correct, statistics for both table and indexes are collected and are up-to-date, there are no unusual optimizer directives or nasty bugs etc

Here is the question:

What would be the expected execution plan for the following query:

SELECT * FROM TABLE WHERE id=45;

Most people would say: an INDEX UNIQUE SCAN on the underlying index, followed by table access, and would, of course, be correct.

Something like this:

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | ROWS  | Bytes | Cost (%CPU)| TIME     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |     5 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |     5 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T1_IDX   |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

But here is a trick question:

Is it possible that the same SQL statement would ignore the index completely and produce a vastly inefficient FULL TABLE SCAN? (without hinting the query or messing with statistics).

Surprisingly, the answer is: Yes and the only “prerequisite” for that is to store NUMBER data in a CHARACTER column.

I.e. look at the following example:

-- We are creating table with 100,000 numbers stored as characters
SQL> CREATE TABLE t1 (id PRIMARY KEY) nologging
AS SELECT CAST(level AS varchar2(10)) FROM dual CONNECT BY level <= 100000;

TABLE created.

-- Let’s collect statistics on all related objects
-- just to be sure that we are not cheating
SQL> EXEC dbms_stats.gather_table_stats(USER, 't1', cascade => TRUE,
estimate_percent => NULL);

PL/SQL PROCEDURE successfully completed.

SQL> SET autotrace ON

-- Let’s confirm that index is alive and well and accessible for queries
SQL> SELECT * FROM t1 WHERE id='25';

ID
----------
25

Execution Plan
----------------------------------------------------------
Plan hash VALUE: 551503811

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | ROWS  | Bytes | Cost (%CPU)| TIME     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |     5 |     1   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| SYS_C00461750 |     1 |     5 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------

1 - access("ID"='25')

-- And now, let’s “simply drop the quotes”, transforming string literal into number …
SQL> SELECT * FROM t1 WHERE id=25;

ID
----------
25

Execution Plan
----------------------------------------------------------

--------------------------------------------------------------------------
| Id  | Operation         | Name | ROWS  | Bytes | Cost (%CPU)| TIME     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     5 |    63  (18)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |     5 |    63  (18)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------

1 - FILTER(TO_NUMBER("ID")=25)

-- And, well, this is the point where people would start asking:
-- “Why ORACLE is not using my index?”

Most developers are actually taken aback by this example.

Granted, comparing numbers to characters in the same expression is not the best thing to do, but shouldn’t ORACLE be smart enough to recognize that 45 is really ’45’ and perform implicit type conversion to “match” the values? After all, isn’t that what modern programming languages are supposed to do?

The truth is – ORACLE does the conversion but NOT how most people would expect.

I.e. the common sense approach to process the following condition:

WHERE char_column = number_literal

Is to transform it into:

WHERE char_column = to_char(number_literal)

Instead, what ORACLE actually does is:

WHERE to_number(char_column) = number_literal

Why is that? Doesn’t ORACLE realize that this transformation will most likely lead to index not being used (unless it is made functional based)?

Doesn’t ORACLE care about performance?

Well, of course, ORACLE cares about performance, but what it cares about even more is returning a quality data.

First of all, when numbers are stored as characters, 45 can be stored as ’45’, but it can, as is often the case, also be stored as ‘ 45’ or ‘0000045’ and the user most likely expects any form to be valid and hence returned. When column data is converted to number, this is exactly what happens:

SQL> INSERT INTO t1 (id) VALUES ('000045');

SQL> commit;

SQL> SELECT * FROM t1 WHERE id=45;

ID
----------
45
000045

But, even more importantly, ‘converting’ the column instead of the value will allow data errors to be discovered faster.

I.e., say, someone inserts the value into ID column that is not really a number:

SQL> INSERT INTO t1 (id) VALUES ('45bad');

1 ROW created.

SQL> commit;

Commit complete.

Now, since column value is converted, the problem will be easily discovered:

SQL> SELECT * FROM t1 WHERE id=25;
ERROR:
ORA-01722: invalid NUMBER

A side effect of this is that all similar queries will now fail with “Invalid number” error (not only those that specifically look for 45). And while this can be annoying, it will lead to a faster discovery of a non-conforming data.

The good news is: this kind of problem is easily avoidable. The best way to deal with it is to:

Always store data with the correct data type

If it is really a number then why not store it as a number, after all ?

But even if you are stuck with incorrectly designed schema, there is still a way to avoid this problem

Always use EXPLICIT type conversion when comparing values of different types

I.e. the following query that uses explicit type conversion will use index without any issues:

SQL> SELECT * FROM t1 WHERE id=to_char(25);

ID
----------
25

Execution Plan
----------------------------------------------------------

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | ROWS  | Bytes | Cost (%CPU)| TIME     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |     5 |     1   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| SYS_C00461750 |     1 |     5 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------

1 - access("ID"='25')

As a side note, be warned that when investigating this problem, SQLplus might show you incorrect results when bind variables are used …