You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-commits@db.apache.org by Apache Wiki <wi...@apache.org> on 2006/09/27 21:46:33 UTC

[Db-derby Wiki] Update of "OptimizerTableNumbers" by BryanPendleton

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by BryanPendleton:
http://wiki.apache.org/db-derby/OptimizerTableNumbers

The comment on the change is:
Capture notes from DERBY-1866

New page:
In the query optimizer code, you will find that the code often manipulates information about tables by using "table numbers". Table numbers are integers which stand in as surrogates for particular tables during the optimization process.

* How are Table Numbers assigned? How do views and synonyms affect this process?

Note that at this point in the processing, constructs like views and synonyms already been transformed and replaced by their underlying "real" tables. Transformations and table resolution occur during the "binding" and "preprocessing" stages of query compilation--and both of those stages occur before optimization begins. So at this point a view will be represented by a ProjectRestrictNode whose child is a SelectNode, and a synonym will be represented by whatever FromTable it (the synonym) is actually referring to.

Table numbers are also assigned during binding/preprocessing, so by the time we get to the code involved with costing and plan selection, all FromTables (aka "Optimizables") in the entire query will have an assigned table number (if required--in some cases it's not necessary and thus will be -1). Additionally any column reference which points to one of those FromTables will have the table number for that FromTable stored locally (namely, in ColumnReference.tableNumber).

Note that when a ColumnReference is "remapped" to point to a different FromTable, its local information--including tableNumber--is updated accordingly. Note also that a "FromTable" is not restricted to base tables--anything that can be specified in the FROM list of a SELECT query will be represented by some instance of FromTable, whether it be a subquery, a base table, a union node, etc. Every FromTable has its own "table number", with the exception of ProjectRestrictNodes. For a PRN, if the PRN's child is itself a FromTable (as opposed to, say, a SelectNode) then the PRN's table number will be -1 and any attempts to "get" the PRN's table number will return the table number of the PRN's child. If the PRN's child is not a FromTable, then the PRN will have it's own table number.

* If the optimizer is choosing to access an index for a table, rather than accessing the table itself, does the table number change depending on whether it is an index or a base table which is being processed by the ProjectRestrictNode?

The thing to note here is that "table number" is strictly a language-created, compilation time value to allow binding, preprocessing, optimization, and code generation to distinguish between the various FromTables in the original query. A table number is not stored on disk and it is independent of the access path decisions (including whether or not an index is used) made by the optimizer. Furthermore, there is no link between a given table number and the actual on-disk table that it points to. Table number 0 could be for T1 in one query, T2 in another query, and T100 in a third query.

As a simple (but admittedly meaningless) example, take the following query:

{{{select t1.i, x1.j from t1, t1 x1 where t1.i = x1.j;}}}

At bind time Derby will assign every item in the FROM list a table number. So in this case, "T1" gets table number 0 and "T1 X1" gets table number 1. The fact that both FromTables are really pointing to the same base table doesn't matter. For the duration of compilation/optimization, they are represented by two different instances of FromTable and are considered two different "tables", each having its own table number. (For the record, in this particular example the different FromTables will in fact point to the same underlying tableDescriptor field).

Given that, the predicate "t1.i = x1.j" will have a left ColumnReference pointing to a FromBaseTable representing T1 with table number "0" and a right ColumnReference pointing to a different FromBaseTable representing X1 (i.e. T1 again) with table number "1".

If the optimizer then decides to use an index for T1, the table number doesn't change--the optimizer just decides that for "the FromBaseTable whose table number is 0 we will use an index". In fact, once assigned, the table number for a specific FromTable remains the same for the duration of the compilation of the statement.