You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-commits@db.apache.org by Apache Wiki <wi...@apache.org> on 2005/11/29 23:02:35 UTC
[Db-derby Wiki] Update of "LanguageOptimize" by DanDebrunner
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.
The following page has been changed by DanDebrunner:
http://wiki.apache.org/db-derby/LanguageOptimize
New page:
= Derby SQL Optimization =
Two sub-phases, pre-process and then optimization.
== Pre-process ==
=== Transformations ===
Preparation, obvious rewrite/optimization
* Convert
{{{c1 = 10 or c1 = 2 ====>>>> c1 in (2, 10)}}}
* For
{{{c1 in (2, 10) ====ADD>>>> c1 >= 2 AND c1 <= 10}}}
* Eliminate {{{NOT}}} (push down, logic conversion).
* Convert predicates to normalized CNF (in a sense):
* TRUE is CNF, or Top node is AND.
* Left sub-tree is DNF, right sub-tree is CNF.
* DNF is defined symmetrically: FALSE or top is OR, left CNF, right DNF.
* LIKE
* {{{ c1 LIKE 'asdf%' ====>>>> c1 LIKE 'asdf%' AND c1 >= 'asdf' AND c1 < 'asdg‘ }}}
* {{{ c1 LIKE ? ====>>>> c1 LIKE ? and c1 >= ? }}}
* BETWEEN: convert to {{{ >= AND <=}}}
* Subquery flattening, join node flattening:
* Eg. For select subquery, conditions:
* Under top AND node (not under OR)
* IN, ANY, EXISTS, binary comparison
* Flatten to normal join, or exists join, depending on uniqueness: pull up and append to outer query level
* If not flattened, at least try to convert to binary comparison and push it down:
* {{{ expression QuantifiedOperator (select x from ...)
====>>>>
(select true from .. where expression <BinaryComparisonOperator> x) IS [NOT] NULL }}}
* For !ResultSetNode (!FromBaseTable etc.):
* Generate referenced table map.
* Convert WHERE and HAVING clauses to !PredicateLists, classify them.
* Generate !ProjectRestrictNode on top of !FromBaseTable and !FromSubquery.
* Push single table predicates to new !ProjectRestrictNodes
* Outer join reordering.
* Union (without ALL) eliminates ORDER BY (prefix columns).
* And of course, a lot more…
== Optimization ==
* Cost based optimization.
* Purpose: select join order, join method, access method, least operation/cost.
* Iterate through all permutations of join orders.
* Cost different join methods (nested loop, hash).
* For base table, cost each access method (heap, indexes).
* Row count info from store, cardinality from SYSSTATISTICS.
* Cost is based on selectivity, row count, index uniqueness, etc.
* According to join order, table reference map, push down predicates, decide if an index can be used, start/stop keys & operators, simple qualifiers further pushed down to store for heap access.
* Decide if the plan can avoid sorting (merge sort cost).
* Consider covering/non-covering index costs
* Cost index access according to start/stop keys or predicate’s estimated selectivity (or statistics for the index).
* If optimizer timeout (too many plans), choose the best one so far
== References ==
http://db.apache.org/derby/papers/optimizer.html Derby Optimizer Design