You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by Felix Beyer <s4...@mail.inf.tu-dresden.de> on 2006/11/09 19:03:04 UTC

Query plan extraction

Hi Derby community,

a while ago I´ve posted some mails, which shortly introduced my 
extensions, which I`m developing during my thesis and which I want to 
contribute to the community. These extensions will extend Derby with 
kind of a persistent workload repository and a well-designed query plan 
extraction extension. Now, I´ve got some questions and I would like to 
know what the community thinks about them:

I`m planning to store the query plans in some kind of system explain 
tables. I´m currently designing the scheme for the tables and I want to 
know, if the community is interested in a more general solution.
1. More specifically I mean, should this explain functionality include 
the plans generated during compilation phase? If yes, should the user 
have the ability to specifiy the exact position when during compilation 
(ParseTree, BindTree, OptimizedTree)?
2. Should the scheme support compilation and execution plans (This would 
mean there must be a mapping from the node tree to the resultset tree 
and vice versa and the scheme has to be more generic)?
2b. Should the approach be oriented on DB2, where the user has the 
ability to switch between, only explain, execute and explain, just 
execute without explain?
3. Another approach would be to develop two decoupled solutions, one for 
the compilation plans and one for the execution plans. The first one 
stores NodeTrees and the second one ResultSetTrees? What do you think, 
does this make sense for you?
4. Should the extension follow general derby architecture 
(FactoryInterface and Implementation) and should it therefore be so 
generic, that for example the extraction of the plans into xml files 
will also be possible with the suggested approach?
5. Should the solution extend, replace or coexist together with the 
ResultSetStatistics facility?
6. Is there some information, which is available, or easily derivable 
from current information, which is interesting for some of you and is 
currently not printed out with the current implementation of the 
ResultSetStatistics?

Have you got further ideas regarding this or some similiar extension?

My current timetable is the following:
In some weeks (before end of november) I want to post a detailed 
concept, describing my extensions in full detail. After the feedback for 
this, I want to start with the development. At the end of the year, I 
want to have a working solution. In the new year, I want to run the 
derby test suites and a couple of performance impact measurements to 
test my solution and to improve the stability. After that I`ll post a 
new thread, providing my results.

Thank you for your feedback,

Felix Beyer

Re: Query plan extraction

Posted by Army <qo...@gmail.com>.

Felix Beyer wrote:
> Hi Derby community,

Hello Felix!  Good to see you back on the lists.  While you were "gone" there 
was some discussion about the current state of Derby's query plan logging and I 
think it was generally agreed that it could use some improvement:

http://thread.gmane.org/gmane.comp.apache.db.derby.devel/30818/focus=30818

So it's great to hear back from you again!  Hopefully you will find the answers, 
resources, support, and encouragement you need to achieve your goals from those 
here on the derby-dev list...

> a while ago I´ve posted some mails, which shortly introduced my 
> extensions, which I`m developing during my thesis and which I want to 
> contribute to the community. These extensions will extend Derby with 
> kind of a persistent workload repository and a well-designed query plan 
> extraction extension.

I am not sure what is meant by "persistent workload repository", but the idea of 
"well-designed query plan extraction" certainly sounds promising to me :)

> 1. More specifically I mean, should this explain functionality include 
> the plans generated during compilation phase? 

I guess my first reaction is "start with whatever is easiest".  In the world of 
open source development you do not have to have a "perfect" solution that does 
"everything" before you contribute it.  It's usually better to start small and 
add functionality piece-by-piece.  This allows the community to see and "play" 
with the code early on, which means you will get feedback much earlier.  It also 
means that members of the community who are interested in what you are doing can 
pick up pieces and do additional development on their own, which may save you time.

Okay, so back to your question.

When I think of "query plan extraction" I think about functionality similar to 
the current "logQueryPlan" behavior in Derby, except (hopefully) better.  So 
would you consider the current output that we get when we set

   derby.language.logQueryPlan=true

to be "compilation plans"?  Or would you say that such output is "execution 
plans"?  If logQueryPlan output is considered "compilation plans" then Yes, I 
think it would be great to have this functionality.  Otherwise I think this is 
functionality that could be useful for future debugging, but it is probably not 
as immediately helpful as a better version of the logQueryPlan would be...

Of course, this is just my own opinion; you and anyone else reading this should 
certainly feel free to difer.  If you want to work on plans generated during 
compilation, then please do!  You are not required to work on any specific thing 
just because I or anyone else say(s) it's "better".  Find what interests you and 
take it as far as you'd like to.  I just hope that whatever you do eventually 
gets contributed in some form or another :)

> If yes, should the user have the ability to specifiy the exact 
> position when during compilation (ParseTree, BindTree, OptimizedTree)?

This sounds like a cool idea.  My guess is that this functionality would be 
great as "follow-up" work after you have an initial, "base" extension on which 
to build.  I certainly do not think this would have to be part of the first round...

> 2. Should the scheme support compilation and execution plans (This would 
> mean there must be a mapping from the node tree to the resultset tree 
> and vice versa and the scheme has to be more generic)?

See my comments for question #1.  Based on this question I am guessing that the 
current output from "logQueryPlan" falls into the category of "execution plans", 
so if I had to pick one I would vote for that.

> 2b. Should the approach be oriented on DB2, where the user has the 
> ability to switch between, only explain, execute and explain, just 
> execute without explain?

I am assuming that when you say "explain" you are talking about "explaining" a 
query plan, is that correct?  So I could, for example, "explain" the query plan 
for "select * from t1"?  Or is that not what you mean?

In any event, I think we would want to have some way to "disable" the explain 
functionality so that people who currently use Derby do not see a performance 
slow-down caused by the extra "explain" work.  Of course, I do not know if there 
will even *be* a slow-down--I'm just guessing that there will be...?

> 3. Another approach would be to develop two decoupled solutions, one for 
> the compilation plans and one for the execution plans. The first one 
> stores NodeTrees and the second one ResultSetTrees? What do you think, 
> does this make sense for you?

This kind of separation certainly seems like it would allow for earlier 
contribution and thus earlier feedback.  That is, you could first work on either 
execution plans or compilation plans until you have something working to your 
satisfaction.  Then you could contribute that piece so that those of us who are 
interested can "play" with it--and while we are "playing", you or someone else 
in the community can start working on the other kind of plan.  Obviously it 
would be great if the two types of plans shared a common set of functionality, 
but again, you do not have to make everything ideal before contributing...

> 4. Should the extension follow general derby architecture 
> (FactoryInterface and Implementation) and should it therefore be so 
> generic, that for example the extraction of the plans into xml files 
> will also be possible with the suggested approach?

The notion of extracting query plans into XML is one that sounds particularly 
interesting to me.  As of Derby 10.2 we have a builtin XML datatype that allows 
simple querying of XML values.  So if we could extract Derby's query plans into 
an XML format, we would (theoretically) be able to query the plans for the 
specific pieces in which we are interested.  That sounds like an excellent 
feature to me.

That said, I will again repeat myself and say "start with whatever is easiest". 
  As cool as it would be to have an XML formatted query plan, maybe that is 
going to require more effort.  In that case you could start with something more 
basic and then add an XML "piece" later on.

And speaking of XML, in an email several months back you wrote:

<begin quote>

By the way, in a former project I managed to extend Derby to extract the 
generated optimized query plans in form of XML files for visualizing them in an 
external application. I used the GXL file format for export and visualized the 
plans with the JGraph Framework. Internal changes affected the current Derby 
structure in two ways: First of all a new system function was added to toggle 
query extraction on or off and second a visitor pattern was used to collect the 
required information through a traverse of the query tree after the optimization 
step.

<end quote>

Is that work related to what you are proposing to work on in the next couple of 
months?  If not, do you have any plans/interest in contributing what you did for 
that project?

> 5. Should the solution extend, replace or coexist together with the 
> ResultSetStatistics facility?

In the interest of "backward compatibility" I think the ideal situation would be 
one in which the "default" behavior is to do what we do currently--i.e. the 
logQueryPlan behavior should remain as it is.  So I do not think we would want 
to replace the existing functionality.  Co-existence and/or some kind of 
optionally-enabled extension to the current logQueryPlan functionality is 
probably preferable.

> 6. Is there some information, which is available, or easily derivable 
> from current information, which is interesting for some of you and is 
> currently not printed out with the current implementation of the 
> ResultSetStatistics?

One thing that came up in recent months was the fact that there are certain 
queries for which the Derby optimizer cost estimates are WAY too high.  See 
DERBY-1905, for example.  So one piece of information that would be nice to have 
in a query plan is an indication of just how bad the optimizer's row and cost 
estimates are for a given query.  This would be very valuable to those 
developers who are interested in improving the optimizer's cost estimates (such 
as me, for example).

But again, start with what is easy and build on it...(have I said that enough 
yet? ;)

> Have you got further ideas regarding this or some similiar extension?

I think what you have talked about sounds excellent.  I would be very happy if 
we had the "query extraction" extension that you describe.  Further ideas and 
extensions could serve as follow-up development for you and anyone else who is 
inclined to participate...

> My current timetable is the following:
> In some weeks (before end of november) I want to post a detailed 
> concept, describing my extensions in full detail. After the feedback for 
> this, I want to start with the development. At the end of the year, I 
> want to have a working solution. In the new year, I want to run the 
> derby test suites and a couple of performance impact measurements to 
> test my solution and to improve the stability. After that I`ll post a 
> new thread, providing my results.

Wow, that is an impressive timeline :)

My one comment here is that you should consider developing and contributing your 
work in incremental fashion.  You do not have to have a complete "working 
solution" in order to post to derby-dev or to ask for feedback.  Feel free to 
post partial or unrefined code and to ask for feedback at any stage during the 
development.  I for one would rather have incomplete or "rough draft" code that 
I can play with early on.  And as I mentioned above, the earlier you contribute, 
the more feedback you will get from the community...

Thanks again for volunteering to be a part of the Apache Derby community!  We 
look forward to hearing more from you as start and complete your project(s).  As 
a developer who spends a lot of my time working in the optimizer, I am certainly 
excited to see what comes of your work...

Army