You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Dag H. Wanvik" <Da...@Sun.COM> on 2008/10/07 16:39:10 UTC

Re: [Db-derby Wiki] Update of "QueryPlanJoinOrder" by BryanPendleton

Thanks for making this page, Bryan! I found this mail thread useful,
so it is good make a Wiki page for it!

Dag

Apache Wiki <wi...@apache.org> writes:

> Dear Wiki user,
>
> You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.
>
> The following page has been changed by BryanPendleton:
> http://wiki.apache.org/db-derby/QueryPlanJoinOrder
>
> The comment on the change is:
> Edit the discussion from the derby-dev list into a wiki page
>
> New page:
> = Understanding Join Order displays in the Statement Execution Plan =
>
> One of the results of running the query optimizer is the determination of the join order.
> For a particular join there is a "left" table and a "right" table, and overall there is
> an ordering of the various intermediate joins into an overall join order.
>
> Looking at the output from the statement execution plan one might ask :
>  * what is the overall join order?
>  * and, for a particular join, which table is the left table and which is the right table?
>
> As a general rule, when looking at a given query plan the join order is reflected by the order in which you see the table names as you scroll from the top of the plan downward.
>
> In the plan "T1" appears first (i.e. closer to the beginning of the plan), then "T3" appears afterward, so that's a general indication that the join order is "T1", "T3".
>
> Note that the query plan shows the underlying base table names, 
> not correlation names or view names or other ways of aliasing base tables, so
> with respect to the top-level query in question, i.e. to:
> {{{
>   select x1.j, x2.b from
>     (select distinct i,j from t1) x1,
>     (select  distinct a,b from t3) x2
>   where x1.i = x2.a
>   order by x1.j, x2.b
> }}}
> the join order is _technically_ { X1, X2 }.  But the query plan only shows base table access, so we have to look to see what tables X1 and X2 access.  In this case X1 accesses T1 while X2 accesses T3, so when we scan the plan and see { T1, T3 }, that effectively implies a join order of { X1, X2 }. 
>
> On a more general level, the "scan downward" approach to finding the join order works because the query plan is written in terms of "left" and "right" result sets, as Knut Anders mentioned.  If I'm joining three tables T1, T2, T3 and the join order chosen by the optimizer is {T2, T3, T1} the final query tree will look something like:
> {{{
>        JOIN_0
>         /  \
>     JOIN_1  T1
>       /  \
>      T2   T3
> }}}
> Notice how each join node has a "left" and a "right" child.  The query plan is generated in depth-first traversal order (starting with the root), so the query plan for the above tree would look something like:
> {{{
> JoinResultSet_0:
> +++ LeftResultSet:
> ++++++ JoinResultSet_1:
> +++++++++ LeftResultSet:  T2
> +++++++++ RightResultSet: T3
> +++ RightResultSet:       T1
> }}}
>>>From this we can see that the order in which the tables appear in the query plan (reading top to bottom) will match the order that comes from reading the leaf nodes of the join tree left-to-right, and that in turn reflects the join order chosen by the optimizer.
>
> === Extracting join order information automatically from the RUNTIMESTATISTICS output ===
>
> Kathey proposed enhancing the RuntimeStatisticsParser to automatically determine join order by searching
> for the specified strings in order and return true if they are there in the order they are in the array.
>
> Army observed that when using this to write automated tests, the test author must be careful to ensure
> that the argument array passed to the new function includes *ALL* tables in the query, not just a subset.
> If the query is of the form:
> {{{
>   select ... from
>     (select ... from t2, t1, t3 where ...) X1
>     (select ... from t1, t2 where ...) X2
>   where ...
> }}}
> Assume a test wants to verify that the tables in subquery X2 have a join order of { T2, T1 }, but doesn't really care about the join order of the subquery in X1, nor does it care about the order of X1 w.r.t. X2.  You'd *still* have to make sure that the array passed into the ordered search method includes the join order for X1, as well, otherwise the test might incorrectly pass.
>
> For example, if we only check for the join order of the "targeted" subquery X2, meaning we pass ["T2", "T1"] into the proposed method and ignore X1 altogether, then the test would IN-correctly PASS for the following query plan:
> {{{
>   ProjectRestrict:
>   +++ JoinNode_0:
>   ++++++ LeftResultSet:                  <== This corresponds to X1
>   +++++++++ JoinNode_1:
>   ++++++++++++ LeftResultSet:
>   +++++++++++++++ JoinNode_2:
>   ++++++++++++++++++ LeftResultSet: T3
>   ++++++++++++++++++ RightResultSet: T2
>   ++++++++++++ RightResultSet: T1
>   ++++++ RightResultSet:                 <== This corresponds to X2
>   +++++++++ JoinNode_3:
>   ++++++++++++ LeftResultSet: T1
>   ++++++++++++ RightResultSet: T2
> }}}
>
> If you just search for "T1" followed by "T2", the test will pass because the join order for X1 matches--but that's wrong because it's really X2 that we wanted to check.
>
> If instead of {"T1", "T2"} you pass in {"T3", "T2", "T1", "T2", "T1"}--i.e. include *ALL* tables in the query, even the ones that aren't necessarily targeted--then I think you'd get the desired behavior. The downside to this is that the test will fail if a join order about which we "don't care" changes (ex. the join order for X1 in this case).  But that's how things work today with the canon-based test, as well, so even if it's not ideal, at least it wouldn't really be any worse...
>
> To get the ideal behavior (where the test fails if and only if the "targeted" subquery's join order is not what is expected) with the proposed orderedSearchStrings() approach, one would have to ensure that the table names used in the targeted subquery do not appear anywhere else in the query.  My guess is that you would have to rewrite a good number of tests to guarantee that, which would probably be non-trivial.
>
> This page was compiled with information posted to the derby-dev mailing list by Manjula, Kathey, Army and Knut.
>

-- 
Dag H. Wanvik, staff engineer
Sun Microsystems, Databases (JavaDB/Derby)
Haakon VII gt. 7b, N-7485 Trondheim, Norway
Tel: x43496/+47 73842196, Fax:  +47 73842101