You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Rick Hillegas (JIRA)" <ji...@apache.org> on 2013/06/11 18:58:21 UTC

[jira] [Comment Edited] (DERBY-6211) Make Optimizer trace logic pluggable.

    [ https://issues.apache.org/jira/browse/DERBY-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680468#comment-13680468 ] 

Rick Hillegas edited comment on DERBY-6211 at 6/11/13 4:57 PM:
---------------------------------------------------------------

Attaching derby-6211-05-aa-xmlOptimizerTracer.diff. This patch adds an optimizer tracer which produces its output in xml. In addition, this patch adds an optional tool for viewing that xml output using SQL. I am running tests now.

I find that it is very hard to read the existing optimizer traces for the following reasons:

i) The trace output is not indented to indicate how facts relate to one another.

ii) In particular, it is hard to piece together the shapes of the query plans which are being evaluated.

I hope that this xml output is easier to read and understand. The output contains elements which nest like the corresponding optimizer data structures:

A) statement - This is the text of an SQL statement which needs optimization.

B) queryBlock - A statement may have many query blocks. For instance, a UNION statement contains many branches, each of which is its own query block. Most subqueries are also independent query blocks. Each query block is optimized in isolation from the others. The isolation goes so far that each query block gets its own optimizer instance.

C) joinOrder - For each query block, the optimizer may consider several join orders. These are the left-to-right orders in which tables would be accessed at execution time. The optimizer builds up a join order incrementally, starting from the leftmost position and adding more tables as it goes. The optimizer may abandon a join order before it is completely filled out. This happens if the optimizer determines that no completion of the join order can result in a plan that's cheaper than the best plan found so far.

D) decoration - As the optimizer fills out the join order, it also considers which conglomerate to use for each table and how to join the table to the table to its left. Of course, the leftmost table doesn't join to anything before it, so for the leftmost table the join strategy is always NESTED_LOOP.

The following other elements appear in the xml output:

E) planCost - The optimizer evaluates the costs of decorated join orders, including the costs of decorated but partial join orders. The planCost element nests inside the joinOrder element. In addition, each queryBlock contains a best planCost.

F) decConglomerateCost - This represents the cost of using a particular conglomerate to scan a table. This element nests inside the decoration element.

In addition to presenting cost information, the planCost element presents two descriptions of the decorated join order being evaluated:

1) summary - This is meant to be a compact, precise description of the plan which we might consider using in an optimizer override. This description identifies conglomerates by the (often cryptic) names stored in SYS.SYSCONGLOMERATES.

2) verbose - This is meant to be a more human-readable description of the plan. Tables are identified by their SQL names or by their correlation names in the query. In addition, index column names are included if the table is accessed by an index.

Although the optimizer considers how tables join leftward, English speakers will want to view the join order rightward. That is how the descriptions are written. In addition, I have introduced the following infix operators to represent the join strategies:

*               NESTED_LOOP
#               HASH_JOIN

I have also fully parenthesized the plan descriptions even though Derby only supports left-deep parentheses today. In the future, Derby may support bushy join orders, requiring different parenthesizing. Putting all of this together, here is a sample summary description:

((45b300a8-013f-33ba-d007-000003789be8 # b6b2c0ae-013f-33ba-d007-000003789be8) # 67bb80b4-013f-33ba-d007-000003789be8) # d8cd40ba-013f-33ba-d007-000003789be8

...and here is the corresponding verbose description:

((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4

For the following query...

select tablename from sys.systables t, sys.syscolumns c, sys.sysaliases a, sys.syssequences s
where t.tablename = c.columnname and c.columnname = a.alias and a.alias = s.sequencename

...here's a sample verbose plan description:

((C # T{TABLENAME,SCHEMAID}) # A{SCHEMAID,ALIAS,NAMESPACE}) # S{SCHEMAID,SEQUENCENAME}

I think that this xml output is much easier to read than traditional Derby optimizer traces. If you use a browser like Firefox, you can collapse and expand elements in order to expose just the information you want to see.

This patch also includes an optional tool (optimizerTracingViews) which gives you a SQL view of all of the planCost elements in the xml output. Here's an example of how to use xml optimizer tracing along with this optional viewing tool. Note that the tracer wants a file name argument but the viewer wants a file url argument:

-- turn on xml-based optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );

-- run this query through the tracer
select * from tab1, tab2, tab3, tab4 where -tab1.b = tab2.b and tab2.a = tab3.a and tab3.a = tab4.a;

-- turn off optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'z.xml' );

-- load the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'file:///Users/rh161140/derby/mainline/z.xml' );

-- view the costs of all complete plans
select estimatedCost, verbose from planCost
where complete
order by estimatedCost
;

-- unload the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );

Here is the output of the query against the planCost view:

ESTIMATEDCOST           |VERBOSE                                                                                                                         
------------------------------------------------------------
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3                                                                                   
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                                                                                   
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                                                                                   
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3                                                                                   
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                                                                                   
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                                                                                   
18224.740429176647      |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1                                                                                   
18224.740429176647      |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1                                                                                   
27806.645417006024      |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1                                                                                   
27806.645417006024      |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1                                                                                   
27903.296430890066      |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1                                                                                   
27903.296430890066      |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1                                                                                   

Here is the full shape of the planCost view:

(
    text varchar( 32672 ),
    stmtID    int,
    qbID   int,
    complete  boolean,
    summary   varchar( 32672 ),
    verbose   varchar( 32672 ),
    type        varchar( 50 ),
    estimatedCost        double,
    estimatedRowCount    bigint
)

I think this functionality is useful enough now that other people can test-drive it. Before writing regression tests for this patch, I would like to gather feedback from developers about how to improve this basic functionality. For instance, is this readable enough? Is there some information from optimizer traces which you often use but which is missing from the xml output?

Further improvements could include:

I) Adding more information to the xml trace. Right now, I have only implemented a subset of the trace methods.

II) Adding more SQL views for reading the xml trace output.


Touches the following files:

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
M       java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java

Changes to the signatures of some of the trace methods so that they give the xml tracer the information it needs.

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java
M       java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java
M       java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java

Each JoinStrategy now has an operator symbol for use in planCost descriptions.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java

The new xml trace logic.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/OptTraceViewer.java
M       java/engine/org/apache/derby/catalog/Java5SystemProcedures.java
M       tools/jar/extraDBMSclasses.properties

The new optional tool for viewing xml trace output.

                
      was (Author: rhillegas):
    Attaching derby-6211-05-aa-xmlOptimizerTracer.diff. This patch adds an optimizer tracer which produces its output in xml. In addition, this patch adds an optional tool for viewing that xml output using SQL. I am running tests now.

I find that it is very hard to read the existing optimizer traces for the following reasons:

i) The trace output is not indented to indicate how facts relate to one another.

ii) In particular, it is hard to piece together the shapes of the query plans which are being evaluated.

I hope that this xml output is easier to read and understand. The output contains elements which nest like the corresponding optimizer data structures:

A) statement - This is the text of an SQL statement which needs optimization.

B) queryBlock - A statement may have many query blocks. For instance, a UNION statement contains many branches, each of which is its own query block. Most subqueries are also independent query blocks. Each query block is optimized in isolation from the others. The isolation goes so far that each query block gets its own optimizer instance.

C) joinOrder - For each query block, the optimizer may consider several join orders. These are the left-to-right orders in which tables would be accessed at execution time. The optimizer builds up a join order incrementally, starting from the leftmost position and adding more tables as it goes. The optimizer may abandon a join order before it is completely filled out. This happens if the optimizer determines that no completion of the join order can result in a plan that's cheaper than the best plan found so far.

D) decoration - As the optimizer fills out the join order, it also considers which conglomerate to use for each table and how to join the table to the table to its left. Of course, the leftmost table doesn't join to anything before it, so for the leftmost table the join strategy is always NESTED_LOOP.

The following other elements appear in the xml output:

E) planCost - The optimizer evaluates the costs of decorated join orders, including the costs of decorated but partial join orders. The planCost element nests inside the joinOrder element. In addition, each queryBlock contains a best planCost.

F) decConglomerateCost - This represents the cost of using a particular conglomerate to scan a table. This element nests inside the decoration element.

In addition to presenting cost information, the planCost element presents two descriptions of the decorated join order being evaluated:

1) summary - This is meant to be a compact, precise description of the plan which we might consider using in an optimizer override. This description identifies conglomerates by the (often cryptic) names stored in SYS.SYSCONGLOMERATES.

2) verbose - This is meant to be a more human-readable description of the plan. Tables are identified by their SQL names or by their correlation names in the query. In addition, index column names are included if the table is accessed by an index.

Although the optimizer considers how tables join leftward, English speakers will want to view the join order rightward. That is how the descriptions are written. In addition, I have introduced the following infix operators to represent the join strategies:

*               NESTED_LOOP
#               HASH_JOIN

I have also fully parenthesized the plan descriptions even though Derby only supports left-deep parentheses today. In the future, Derby may support bushy join orders, requiring different parenthesizing. Putting all of this together, here is a sample summary description:

((45b300a8-013f-33ba-d007-000003789be8 # b6b2c0ae-013f-33ba-d007-000003789be8) # 67bb80b4-013f-33ba-d007-000003789be8) # d8cd40ba-013f-33ba-d007-000003789be8

...and here is the corresponding verbose description:

((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4

For the following query...

select tablename from sys.systables t, sys.syscolumns c, sys.sysaliases a, sys.syssequences s
where t.tablename = c.columnname and c.columnname = a.alias and a.alias = s.sequencename

...here's a sample verbose plan description:

((C # T{TABLENAME,SCHEMAID}) # A{SCHEMAID,ALIAS,NAMESPACE}) # S{SCHEMAID,SEQUENCENAME}

I think that this xml output is much easier to read than traditional Derby optimizer traces. If you use a browser like Firefox, you can collapse and expand elements in order to expose just the information you want to see.

This patch also includes an optional tool (optimizerTracingViews) which gives you a SQL view of all of the planCost elements in the xml output. Here's an example of how to use xml optimizer tracing along with this optional viewing tool. Note that the tracer wants a file name argument but the viewer wants a file url argument:

-- turn on xml-based optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );

-- run this query through the tracer
select * from tab1, tab2, tab3, tab4 where -tab1.b = tab2.b and tab2.a = tab3.a and tab3.a = tab4.a;

-- turn off optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'z.xml' );

-- load the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'file:///Users/rh161140/derby/mainline/z.xml' );

-- view the costs of all complete plans
select estimatedCost, verbose from planCost
where complete
order by estimatedCost
;

-- unload the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );

Here is the output of the query against the planCost view:

ESTIMATEDCOST           |VERBOSE                                                                                                                         
------------------------------------------------------------
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3                                                                                   
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                                                                                   
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                                                                                   
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3                                                                                   
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                                                                                   
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                                                                                   
18224.740429176647      |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1                                                                                   
18224.740429176647      |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1                                                                                   
27806.645417006024      |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1                                                                                   
27806.645417006024      |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1                                                                                   
27903.296430890066      |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1                                                                                   
27903.296430890066      |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1                                                                                   
28336.34888640299       |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1                                                                                   


I think this functionality is useful enough now that other people can test-drive it. Before writing regression tests for this patch, I would like to gather feedback from developers about how to improve this basic functionality. For instance, is this readable enough? Is there some information from optimizer traces which you often use but which is missing from the xml output?

Further improvements could include:

I) Adding more information to the xml trace. Right now, I have only implemented a subset of the trace methods.

II) Adding more SQL views for reading the xml trace output.


Touches the following files:

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
M       java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java

Changes to the signatures of some of the trace methods so that they give the xml tracer the information it needs.

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java
M       java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java
M       java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java

Each JoinStrategy now has an operator symbol for use in planCost descriptions.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java

The new xml trace logic.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/OptTraceViewer.java
M       java/engine/org/apache/derby/catalog/Java5SystemProcedures.java
M       tools/jar/extraDBMSclasses.properties

The new optional tool for viewing xml trace output.

                  
> Make Optimizer trace logic pluggable.
> -------------------------------------
>
>                 Key: DERBY-6211
>                 URL: https://issues.apache.org/jira/browse/DERBY-6211
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.11.0.0
>            Reporter: Rick Hillegas
>            Assignee: Rick Hillegas
>         Attachments: derby-6211-01-aa-createPlugin.diff, derby-6211-02-aa-cleanup.diff, derby-6211-02-ab-cleanup.diff, derby-6211-03-aa-customTracer.diff, derby-6211-04-aa-moveOptimizerTracerToEngineJar.diff, derby-6211-05-aa-xmlOptimizerTracer.diff
>
>
> Right now the trace logic in the optimizer is hard-coded to produce a stream of diagnostics. It would be good to be able to plug alternative trace logic into the optimizer. This would make the following possible:
> 1) Plug in trace logic which produces formats which are easier to study and which can be analyzed mechanically. E.g., xml formatted output.
> 2) Plug in trace logic which can be used during unit testing to verify that the optimizer has picked the right plan. Over time this might make it easier to migrate canon-based tests to assertion-based tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira