You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Felix Beyer <s4...@mail.inf.tu-dresden.de> on 2006/11/09 19:03:04 UTC
Query plan extraction
Hi Derby community,
a while ago I´ve posted some mails, which shortly introduced my
extensions, which I`m developing during my thesis and which I want to
contribute to the community. These extensions will extend Derby with
kind of a persistent workload repository and a well-designed query plan
extraction extension. Now, I´ve got some questions and I would like to
know what the community thinks about them:
I`m planning to store the query plans in some kind of system explain
tables. I´m currently designing the scheme for the tables and I want to
know, if the community is interested in a more general solution.
1. More specifically I mean, should this explain functionality include
the plans generated during compilation phase? If yes, should the user
have the ability to specifiy the exact position when during compilation
(ParseTree, BindTree, OptimizedTree)?
2. Should the scheme support compilation and execution plans (This would
mean there must be a mapping from the node tree to the resultset tree
and vice versa and the scheme has to be more generic)?
2b. Should the approach be oriented on DB2, where the user has the
ability to switch between, only explain, execute and explain, just
execute without explain?
3. Another approach would be to develop two decoupled solutions, one for
the compilation plans and one for the execution plans. The first one
stores NodeTrees and the second one ResultSetTrees? What do you think,
does this make sense for you?
4. Should the extension follow general derby architecture
(FactoryInterface and Implementation) and should it therefore be so
generic, that for example the extraction of the plans into xml files
will also be possible with the suggested approach?
5. Should the solution extend, replace or coexist together with the
ResultSetStatistics facility?
6. Is there some information, which is available, or easily derivable
from current information, which is interesting for some of you and is
currently not printed out with the current implementation of the
ResultSetStatistics?
Have you got further ideas regarding this or some similiar extension?
My current timetable is the following:
In some weeks (before end of november) I want to post a detailed
concept, describing my extensions in full detail. After the feedback for
this, I want to start with the development. At the end of the year, I
want to have a working solution. In the new year, I want to run the
derby test suites and a couple of performance impact measurements to
test my solution and to improve the stability. After that I`ll post a
new thread, providing my results.
Thank you for your feedback,
Felix Beyer
Re: Query plan extraction
Posted by Army <qo...@gmail.com>.
Felix Beyer wrote:
> Hi Derby community,
Hello Felix! Good to see you back on the lists. While you were "gone" there
was some discussion about the current state of Derby's query plan logging and I
think it was generally agreed that it could use some improvement:
http://thread.gmane.org/gmane.comp.apache.db.derby.devel/30818/focus=30818
So it's great to hear back from you again! Hopefully you will find the answers,
resources, support, and encouragement you need to achieve your goals from those
here on the derby-dev list...
> a while ago I´ve posted some mails, which shortly introduced my
> extensions, which I`m developing during my thesis and which I want to
> contribute to the community. These extensions will extend Derby with
> kind of a persistent workload repository and a well-designed query plan
> extraction extension.
I am not sure what is meant by "persistent workload repository", but the idea of
"well-designed query plan extraction" certainly sounds promising to me :)
> 1. More specifically I mean, should this explain functionality include
> the plans generated during compilation phase?
I guess my first reaction is "start with whatever is easiest". In the world of
open source development you do not have to have a "perfect" solution that does
"everything" before you contribute it. It's usually better to start small and
add functionality piece-by-piece. This allows the community to see and "play"
with the code early on, which means you will get feedback much earlier. It also
means that members of the community who are interested in what you are doing can
pick up pieces and do additional development on their own, which may save you time.
Okay, so back to your question.
When I think of "query plan extraction" I think about functionality similar to
the current "logQueryPlan" behavior in Derby, except (hopefully) better. So
would you consider the current output that we get when we set
derby.language.logQueryPlan=true
to be "compilation plans"? Or would you say that such output is "execution
plans"? If logQueryPlan output is considered "compilation plans" then Yes, I
think it would be great to have this functionality. Otherwise I think this is
functionality that could be useful for future debugging, but it is probably not
as immediately helpful as a better version of the logQueryPlan would be...
Of course, this is just my own opinion; you and anyone else reading this should
certainly feel free to difer. If you want to work on plans generated during
compilation, then please do! You are not required to work on any specific thing
just because I or anyone else say(s) it's "better". Find what interests you and
take it as far as you'd like to. I just hope that whatever you do eventually
gets contributed in some form or another :)
> If yes, should the user have the ability to specifiy the exact
> position when during compilation (ParseTree, BindTree, OptimizedTree)?
This sounds like a cool idea. My guess is that this functionality would be
great as "follow-up" work after you have an initial, "base" extension on which
to build. I certainly do not think this would have to be part of the first round...
> 2. Should the scheme support compilation and execution plans (This would
> mean there must be a mapping from the node tree to the resultset tree
> and vice versa and the scheme has to be more generic)?
See my comments for question #1. Based on this question I am guessing that the
current output from "logQueryPlan" falls into the category of "execution plans",
so if I had to pick one I would vote for that.
> 2b. Should the approach be oriented on DB2, where the user has the
> ability to switch between, only explain, execute and explain, just
> execute without explain?
I am assuming that when you say "explain" you are talking about "explaining" a
query plan, is that correct? So I could, for example, "explain" the query plan
for "select * from t1"? Or is that not what you mean?
In any event, I think we would want to have some way to "disable" the explain
functionality so that people who currently use Derby do not see a performance
slow-down caused by the extra "explain" work. Of course, I do not know if there
will even *be* a slow-down--I'm just guessing that there will be...?
> 3. Another approach would be to develop two decoupled solutions, one for
> the compilation plans and one for the execution plans. The first one
> stores NodeTrees and the second one ResultSetTrees? What do you think,
> does this make sense for you?
This kind of separation certainly seems like it would allow for earlier
contribution and thus earlier feedback. That is, you could first work on either
execution plans or compilation plans until you have something working to your
satisfaction. Then you could contribute that piece so that those of us who are
interested can "play" with it--and while we are "playing", you or someone else
in the community can start working on the other kind of plan. Obviously it
would be great if the two types of plans shared a common set of functionality,
but again, you do not have to make everything ideal before contributing...
> 4. Should the extension follow general derby architecture
> (FactoryInterface and Implementation) and should it therefore be so
> generic, that for example the extraction of the plans into xml files
> will also be possible with the suggested approach?
The notion of extracting query plans into XML is one that sounds particularly
interesting to me. As of Derby 10.2 we have a builtin XML datatype that allows
simple querying of XML values. So if we could extract Derby's query plans into
an XML format, we would (theoretically) be able to query the plans for the
specific pieces in which we are interested. That sounds like an excellent
feature to me.
That said, I will again repeat myself and say "start with whatever is easiest".
As cool as it would be to have an XML formatted query plan, maybe that is
going to require more effort. In that case you could start with something more
basic and then add an XML "piece" later on.
And speaking of XML, in an email several months back you wrote:
<begin quote>
By the way, in a former project I managed to extend Derby to extract the
generated optimized query plans in form of XML files for visualizing them in an
external application. I used the GXL file format for export and visualized the
plans with the JGraph Framework. Internal changes affected the current Derby
structure in two ways: First of all a new system function was added to toggle
query extraction on or off and second a visitor pattern was used to collect the
required information through a traverse of the query tree after the optimization
step.
<end quote>
Is that work related to what you are proposing to work on in the next couple of
months? If not, do you have any plans/interest in contributing what you did for
that project?
> 5. Should the solution extend, replace or coexist together with the
> ResultSetStatistics facility?
In the interest of "backward compatibility" I think the ideal situation would be
one in which the "default" behavior is to do what we do currently--i.e. the
logQueryPlan behavior should remain as it is. So I do not think we would want
to replace the existing functionality. Co-existence and/or some kind of
optionally-enabled extension to the current logQueryPlan functionality is
probably preferable.
> 6. Is there some information, which is available, or easily derivable
> from current information, which is interesting for some of you and is
> currently not printed out with the current implementation of the
> ResultSetStatistics?
One thing that came up in recent months was the fact that there are certain
queries for which the Derby optimizer cost estimates are WAY too high. See
DERBY-1905, for example. So one piece of information that would be nice to have
in a query plan is an indication of just how bad the optimizer's row and cost
estimates are for a given query. This would be very valuable to those
developers who are interested in improving the optimizer's cost estimates (such
as me, for example).
But again, start with what is easy and build on it...(have I said that enough
yet? ;)
> Have you got further ideas regarding this or some similiar extension?
I think what you have talked about sounds excellent. I would be very happy if
we had the "query extraction" extension that you describe. Further ideas and
extensions could serve as follow-up development for you and anyone else who is
inclined to participate...
> My current timetable is the following:
> In some weeks (before end of november) I want to post a detailed
> concept, describing my extensions in full detail. After the feedback for
> this, I want to start with the development. At the end of the year, I
> want to have a working solution. In the new year, I want to run the
> derby test suites and a couple of performance impact measurements to
> test my solution and to improve the stability. After that I`ll post a
> new thread, providing my results.
Wow, that is an impressive timeline :)
My one comment here is that you should consider developing and contributing your
work in incremental fashion. You do not have to have a complete "working
solution" in order to post to derby-dev or to ask for feedback. Feel free to
post partial or unrefined code and to ask for feedback at any stage during the
development. I for one would rather have incomplete or "rough draft" code that
I can play with early on. And as I mentioned above, the earlier you contribute,
the more feedback you will get from the community...
Thanks again for volunteering to be a part of the Apache Derby community! We
look forward to hearing more from you as start and complete your project(s). As
a developer who spends a lot of my time working in the optimizer, I am certainly
excited to see what comes of your work...
Army