You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Satheesh Bandaram <sa...@Sourcery.Org> on 2004/12/17 23:41:42 UTC

Optimizer hints?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
I have been thinking about adding optimizer hints to Derby. Though Derby
optimizer does perform a reasonable job, it may be useful to have
optimizer hints for cases 1) when updated index statistics is not
available, or even incorrect 2) Rare cases, like this one?, when the
optimizer doesn't do the job right 3) when applications issue queries
not designed for Derby (like very large number of tables).

Derby optimizer is primarily tasked to handle 1) access method (whether
to use table scan or index) 2) join order and 3) join strategy (nested
loop or HashJoin) A complete optimizer hints (should they actually be
called optimizer overrides?) should be able to provide means of
specifying all these. There are several ways to do this, including

   1. Enhance the SQL to recognize additional keywords, like properties.
These properties could specify optimizer hints. That is what Cloudscape
had before, but this causes non-portable SQL to be written. Attempts to
run these statements against any other SQL server could cause syntax
errors. Not an ideal solution, according to me.
   2. Provide optimizer hints as SQL comments. These comments are
recognized only by Derby parser and other SQL engines would simply
ignore these. There are some limitations in Derby parser currently to
implement this option.
   3. Provide an offline tool which registers hints for SQL statements.
When those SQL statements are executed, Derby could look up previously
registered hints for those statements and automatically apply them.
Advantage of this scheme is that applications don't need to be modified
to add hints and they can easily be removed when not needed. How exactly
to register the hints could be worked out. I personally prefer this
approach.

Any comments?

Satheesh

Gerald Khin (JIRA) wrote:

>HashJoinStrategy leads to java.lang.OutOfMemoryError
>----------------------------------------------------
>
>         Key: DERBY-106
>         URL: http://nagoya.apache.org/jira/browse/DERBY-106
>     Project: Derby
>        Type: Bug
>    Reporter: Gerald Khin
>
>
>My application is running out of memory: I encounterd a
java.lang.OutOfMemoryError. I used -Xmx256M. Unfortunatley, I cannot
spend an arbitrary amount of JVM memory.
>
>Then, I commented out the line in class OptimizerFactoryImpl which was
adding the HashJoinStrategy to the set of Join strategies:
>
>        if (joinStrategySet == null)
>        {
>//            JoinStrategy[] jss = new JoinStrategy[2];
>            JoinStrategy[] jss = new JoinStrategy[1];
>            jss[0] = new NestedLoopJoinStrategy();
>//            jss[1] = new HashJoinStrategy();
>            joinStrategySet = jss;
>        }
>
>And with these changes the OutOfMemoryError has gone away! And it works
even with -Xmx128M!!!
>
>So I guess that there is a major memory issue with this HashJoin
strategy implementation.
>
>If it turns out to be too complicated to make the memory consumption
more predicatble or even bounded to some configurable limit, then I need
at least as a workaround a way to turn off the HashJoin strategy
completely: I did it by patching and building my own derby.jar, but if
there would be an official solution with some kind of switch like a
system property, it would be great!
>
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
 
iD8DBQFBw2ClENVNIY6DZ7ERAtofAJ0Sgt4OnJPawnVlF22hI7+y7wZyvwCeKjOw
aD0AJPCaifS2ZEq50cU5TWk=
=Bq8A
-----END PGP SIGNATURE-----


Re: Optimizer hints?

Posted by Jan Hlavaty <hl...@code.cz>.
Jeremy Boynes wrote:
> I would prefer using the same mechanism as most other databases:
> embedding hints in comments.

I agree, not only because it is easiest to keep the hints in sync with
the queries, but also because existing tools can be used to make the
hints. I have used comment based hints with Oracle and it was good.

Jan

Re: Optimizer hints?

Posted by Jeremy Boynes <jb...@apache.org>.
Satheesh Bandaram wrote:
>    1. Enhance the SQL to recognize additional keywords, like properties.
> These properties could specify optimizer hints. That is what Cloudscape
> had before, but this causes non-portable SQL to be written. Attempts to
> run these statements against any other SQL server could cause syntax
> errors. Not an ideal solution, according to me.
>    2. Provide optimizer hints as SQL comments. These comments are
> recognized only by Derby parser and other SQL engines would simply
> ignore these. There are some limitations in Derby parser currently to
> implement this option.
>    3. Provide an offline tool which registers hints for SQL statements.
> When those SQL statements are executed, Derby could look up previously
> registered hints for those statements and automatically apply them.
> Advantage of this scheme is that applications don't need to be modified
> to add hints and they can easily be removed when not needed. How exactly
> to register the hints could be worked out. I personally prefer this
> approach.
> 

The challenge I see with 3) is how the hints are associated with a 
specific query. If matching is done using the query text then any slight 
modification in the query will break the match resulting in an un-hinted 
plan. So a minor change in application code could have a big impact on 
performance simply because a sub-optimal plan was used.

This could be addressed by providing a key to match on in the query text 
(e.g. in a comment) but that seems to raise the same issues as simply 
embedding the hint whilst still adding the additional complexity 
associated with administering the hint data.

I would prefer using the same mechanism as most other databases: 
embedding hints in comments.

--
Jeremy

Re: Optimizer hints?

Posted by RPost <rp...@pacbell.net>.
Jack - I have to disagree.

Gerald's post is at the end of your response and I sure don't see any query
in it.

I checked your link to jira's DERBY-106 and don't see either a query there
or an attached file that provides the query.

What am I missing?

Gerald did say that when he prevented the optimizer from considering a
hashjoin his query worked even with a memory setting of 128M.This doesn't
really tell us much. There could be a bug in the optimizer or in the
hashjoin code itself. It could also mean that the table is too big for the
property settings that are being used.

As you said: 'Sooner or later they come across performance critical queries
where the optimizer makes a bad choice, but the optimizer cannot be fixed
easily or quickly.

Exactly! The first order of business is to identify the conditions where the
Derby optimizer makes a bad choice. Until these conditions are identified
there is nothing to fix.

Here is what I would like to know:

1. What is the environment (sysinfo)?
2. What are the JVM settings being used
3. What query is being executed?
4. What execution plan is Derby using now? If we don't know what choice the
optimizer is making I don't know how it can be considered wrong.

5. How big is the table? (You ask this in your initial reply)

I think hints are great. I use them in Oracle all the time.

I just don't agree that we have enough information yet to diagnose Gerald's
problem let alone propose hints as a possible solution.

----- Original Message ----- 
From: "Jack Klebanoff" <kl...@Mutagen.Net>
To: "Derby Development" <de...@db.apache.org>
Sent: Friday, December 17, 2004 5:22 PM
Subject: Re: Optimizer hints?


> RPost wrote:
>
>  >Do you have any specific examples of queries that could benefit from an
>  >optimizer hint?
>
> This thread started with an example of a query that could benefit from
> an optimizer hint. Gerald Khin has a query that blows up because the
> Derby  optimizer made the wrong choice. See
> http://nagoya.apache.org/jira/browse/DERBY-106
>
> Most database systems that have been around for a while have implemented
> optimizer hints. Sooner or later they come across performance critical
> queries where the optimizer makes a bad choice, but the optimizer cannot
> be fixed easily or quickly. (Adding an optimizer hint is generally
> faster than waiting for a new optimizer release).
>
> Jack
>
>  >
>  >Does anyone know of any documented instances where Derby has selected an
>  >execution plan that is ineffective or suboptimal? Were these instances
> able
>  >to be corrected by tuning the system using existing
>  >properties?
>  >
>  >Oracle allows hints to be provided as SQL comments.
>  >
>  >Re providing an offline tool. Does this method imply that the statements
>  >exist independent of any given running database instance. If so are you
>  >proposing that a prepared statement be saved and made available for
future
>  >execution without preparing it again?
>  >
>  >----- Original Message -----
>  >From: "Satheesh Bandaram" <sa...@Sourcery.Org>
>  >To: "Derby Development" <de...@db.apache.org>
>  >Sent: Friday, December 17, 2004 2:41 PM
>  >Subject: Optimizer hints?
>  >
>  >
>
> > I have been thinking about adding optimizer hints to Derby. Though Derby
> > optimizer does perform a reasonable job, it may be useful to have
> > optimizer hints for cases 1) when updated index statistics is not
> > available, or even incorrect 2) Rare cases, like this one?, when the
> > optimizer doesn't do the job right 3) when applications issue queries
> > not designed for Derby (like very large number of tables).
> >
> > Derby optimizer is primarily tasked to handle 1) access method (whether
> > to use table scan or index) 2) join order and 3) join strategy (nested
> > loop or HashJoin) A complete optimizer hints (should they actually be
> > called optimizer overrides?) should be able to provide means of
> > specifying all these. There are several ways to do this, including
> >
> >    1. Enhance the SQL to recognize additional keywords, like properties.
> > These properties could specify optimizer hints. That is what Cloudscape
> > had before, but this causes non-portable SQL to be written. Attempts to
> > run these statements against any other SQL server could cause syntax
> > errors. Not an ideal solution, according to me.
> >    2. Provide optimizer hints as SQL comments. These comments are
> > recognized only by Derby parser and other SQL engines would simply
> > ignore these. There are some limitations in Derby parser currently to
> > implement this option.
> >    3. Provide an offline tool which registers hints for SQL statements.
> > When those SQL statements are executed, Derby could look up previously
> > registered hints for those statements and automatically apply them.
> > Advantage of this scheme is that applications don't need to be modified
> > to add hints and they can easily be removed when not needed. How exactly
> > to register the hints could be worked out. I personally prefer this
> > approach.
> >
> > Any comments?
> >
> > Satheesh
> >
> > Gerald Khin (JIRA) wrote:
> >
> > >HashJoinStrategy leads to java.lang.OutOfMemoryError
> > >----------------------------------------------------
> >
> > >        Key: DERBY-106
> > >        URL: http://nagoya.apache.org/jira/browse/DERBY-106
> > >    Project: Derby
> > >       Type: Bug
> > >   Reporter: Gerald Khin
> >
> >
> > >My application is running out of memory: I encounterd a
> >
> > java.lang.OutOfMemoryError. I used -Xmx256M. Unfortunatley, I cannot
> > spend an arbitrary amount of JVM memory.
> >
> > >Then, I commented out the line in class OptimizerFactoryImpl which was
> >
> > adding the HashJoinStrategy to the set of Join strategies:
> >
> > >       if (joinStrategySet == null)
> > >       {
> > >//            JoinStrategy[] jss = new JoinStrategy[2];
> > >           JoinStrategy[] jss = new JoinStrategy[1];
> > >           jss[0] = new NestedLoopJoinStrategy();
> > >//            jss[1] = new HashJoinStrategy();
> > >           joinStrategySet = jss;
> > >       }
> >
> > >And with these changes the OutOfMemoryError has gone away! And it works
> >
> > even with -Xmx128M!!!
> >
> > >So I guess that there is a major memory issue with this HashJoin
> >
> > strategy implementation.
> >
> > >If it turns out to be too complicated to make the memory consumption
> >
> > more predicatble or even bounded to some configurable limit, then I need
> > at least as a workaround a way to turn off the HashJoin strategy
> > completely: I did it by patching and building my own derby.jar, but if
> > there would be an official solution with some kind of switch like a
> > system property, it would be great!
> >
> >
> >
>


Re: Optimizer hints?

Posted by Jack Klebanoff <kl...@Mutagen.Net>.
RPost wrote:

 >Do you have any specific examples of queries that could benefit from an
 >optimizer hint?

This thread started with an example of a query that could benefit from 
an optimizer hint. Gerald Khin has a query that blows up because the 
Derby  optimizer made the wrong choice. See 
http://nagoya.apache.org/jira/browse/DERBY-106

Most database systems that have been around for a while have implemented 
optimizer hints. Sooner or later they come across performance critical 
queries where the optimizer makes a bad choice, but the optimizer cannot 
be fixed easily or quickly. (Adding an optimizer hint is generally 
faster than waiting for a new optimizer release).

Jack

 >
 >Does anyone know of any documented instances where Derby has selected an
 >execution plan that is ineffective or suboptimal? Were these instances 
able
 >to be corrected by tuning the system using existing
 >properties?
 >
 >Oracle allows hints to be provided as SQL comments.
 >
 >Re providing an offline tool. Does this method imply that the statements
 >exist independent of any given running database instance. If so are you
 >proposing that a prepared statement be saved and made available for future
 >execution without preparing it again?
 >
 >----- Original Message -----
 >From: "Satheesh Bandaram" <sa...@Sourcery.Org>
 >To: "Derby Development" <de...@db.apache.org>
 >Sent: Friday, December 17, 2004 2:41 PM
 >Subject: Optimizer hints?
 >
 >

> I have been thinking about adding optimizer hints to Derby. Though Derby
> optimizer does perform a reasonable job, it may be useful to have
> optimizer hints for cases 1) when updated index statistics is not
> available, or even incorrect 2) Rare cases, like this one?, when the
> optimizer doesn't do the job right 3) when applications issue queries
> not designed for Derby (like very large number of tables).
>
> Derby optimizer is primarily tasked to handle 1) access method (whether
> to use table scan or index) 2) join order and 3) join strategy (nested
> loop or HashJoin) A complete optimizer hints (should they actually be
> called optimizer overrides?) should be able to provide means of
> specifying all these. There are several ways to do this, including
>
>    1. Enhance the SQL to recognize additional keywords, like properties.
> These properties could specify optimizer hints. That is what Cloudscape
> had before, but this causes non-portable SQL to be written. Attempts to
> run these statements against any other SQL server could cause syntax
> errors. Not an ideal solution, according to me.
>    2. Provide optimizer hints as SQL comments. These comments are
> recognized only by Derby parser and other SQL engines would simply
> ignore these. There are some limitations in Derby parser currently to
> implement this option.
>    3. Provide an offline tool which registers hints for SQL statements.
> When those SQL statements are executed, Derby could look up previously
> registered hints for those statements and automatically apply them.
> Advantage of this scheme is that applications don't need to be modified
> to add hints and they can easily be removed when not needed. How exactly
> to register the hints could be worked out. I personally prefer this
> approach.
>
> Any comments?
>
> Satheesh
>
> Gerald Khin (JIRA) wrote:
>
> >HashJoinStrategy leads to java.lang.OutOfMemoryError
> >----------------------------------------------------
>
> >        Key: DERBY-106
> >        URL: http://nagoya.apache.org/jira/browse/DERBY-106
> >    Project: Derby
> >       Type: Bug
> >   Reporter: Gerald Khin
>
>
> >My application is running out of memory: I encounterd a
>
> java.lang.OutOfMemoryError. I used -Xmx256M. Unfortunatley, I cannot
> spend an arbitrary amount of JVM memory.
>
> >Then, I commented out the line in class OptimizerFactoryImpl which was
>
> adding the HashJoinStrategy to the set of Join strategies:
>
> >       if (joinStrategySet == null)
> >       {
> >//            JoinStrategy[] jss = new JoinStrategy[2];
> >           JoinStrategy[] jss = new JoinStrategy[1];
> >           jss[0] = new NestedLoopJoinStrategy();
> >//            jss[1] = new HashJoinStrategy();
> >           joinStrategySet = jss;
> >       }
>
> >And with these changes the OutOfMemoryError has gone away! And it works
>
> even with -Xmx128M!!!
>
> >So I guess that there is a major memory issue with this HashJoin
>
> strategy implementation.
>
> >If it turns out to be too complicated to make the memory consumption
>
> more predicatble or even bounded to some configurable limit, then I need
> at least as a workaround a way to turn off the HashJoin strategy
> completely: I did it by patching and building my own derby.jar, but if
> there would be an official solution with some kind of switch like a
> system property, it would be great!
>
>
>


Re: Optimizer hints?

Posted by RPost <rp...@pacbell.net>.
Thanks Mike and Satheesh. Your comments are getting closer to what I had in
mind when I suggested that we document a specific need that we expect the
use of hints to fix. (By the way I am not opposed to introducing hints I
just think we ought to document a precise need).

An approach that I have seen groups have success with in the past is:

1. State the problem as clearly and concisely as possible.
2. Enumerate the key issues that illustrate the problem.
3. Gather some metrics that validate the problem and verify the extent of
the problem.
4. Propose a solution to the problem.
5. Implement and test the proposed solution.

In my comments I was essentially saying that we were proposing a solution
(step 3) before we had a clear problem statement and an illustration of at
least one problem case that might benefit from a solution.

I am assuming that the purpose of a 'hint' is to nudge Derby to select an
optimization choice that it might not otherwise select. If so, this suggests
that in order to provide a proper set of hints we must first know what
choices Derby has now, the process Derby uses to make a choice and an
example query that illustrates Derby making the wrong choice. If we can't do
that then there is no need for hints.

Can someone with Derby experience try to correct and expand my (feeble)
attempts to document steps 1, 2, 3, 4 and 5 above?

1. Problem - Derby does not always make the best choice when optimizing
queries.
    a. What is the list of choices that Derby currently has available? Hash
Join, Nested Loop Join, Table Join Order, Other? Are all choices available
for all query types (insert, update, delete, ddl, batch)?
    b. What process does Derby use currently to choose among the
alternatives available? Is this processs the same for all query types
(insert, etc)?
    c. Is there currently a way to document (e.g. explain plan, log file,
debug messages) what optimization choice Derby uses for a particular query.

2. Possible key issues
    a. Queries that must sort 1 or multiple tables, queries with large
numbers of tables, queries with one or more large (what is large?) tables,
queries on tables with no indexes on join columns.
    b. Does the type of result set (readonly, updateable, scrollable, etc)
affect the optimizer's choice?
    c. Does the amount of memory affect the choice.
    d. other issues

3. Gather some metrics
    a. This step will not only define the extent of the problem but will
point us to appropriate hints or fixes.

4. Proposed solution - add a means of providing hints to the optimizer.
    a. Will the hints be 'suggestions' (Index 'A' might help or join table B
first) or 'commands' (do a full table scan, use index 'A')? That is, how
much weight will a hint carry?
    b. Will the hints be provided for all query types (insert, etc)?
    c. Will the hints be used only for prepared statements or for all
queries?
    d. How many different types of hints might be used? join strategy, index
use, full table scan use, join order?

5. Implement and test a proposed solution
    a. create some test cases - this will also include gathering metrics
(step 3) for the test cases
    b. run the test cases with and without the hints
    c. validate that the solution produces measurable results.

I think that a very useful first step might be to introduce some way of
forcing Derby to use a particular optimization. How else will we be able to
get the metrics we need to know if Derby is making a good choice?

A good example is Gerald's problem. In his query it appears that Derby was
using, or trying to use, a HashJoin strategy and running out of memory. When
he forced Derby to use a NestedLoop join strategy the query not only ran but
ran very quickly.

Unfortunately Gerald had to modify code in order to get the metrics that
identified the problem and suggested a solution. If there were a way to
force Derby to use a HashJoin, and then to rerun the same query and force
Derby to use a NestedLoop join he could have gotten the metrics needed with
little effort.

Ironically, one method of forcing Derby to use a particular join strategy is
to introduce some form of 'hint' that is either mandatory or has sufficient
weight to achieve the desired effect. This could be achieved by adding a new
property or option that could be used during integration testing.

----- Original Message ----- 
From: "Mike Matrigali" <mi...@sbcglobal.net>
To: "Derby Development" <de...@db.apache.org>
Sent: Tuesday, December 21, 2004 3:01 PM
Subject: Re: Optimizer hints?


> All the database systems I have worked on have had optimizer hints,
> while I agree it would be best if we didn't have to ever have them I
> think they are a good way to be able to provide a quick fix while we
> improve the optimizer to not need them.  I agree with all the posts that
> encourage people to provide as much detail as possible to their problems
> with query plans.
>
> I think it is a good idea to somehow introduce the hints such that it
> does not affect the standard SQL.  Seems like it might be nice to make
> it some sort of XML that might fit with an eventual XML output of query
> plans so that tools could read and write them.
>
> As a zero admin db derby should have the goal of making the hints not
> necessary, but in the real world we may not get there - so I see them as
> a necessary evil.  I also think it makes supporting the product easier
> as one can force a test case when maybe the customers data that caused
> that particular query plan shape is not available.
>
> The area of the optimizer may provide some interesting work for those
> interested out there.  I am not an expert in the optimizer but here are
> some areas I have thought may be interesting:
>
> o add a sort merge option to the optimizer choices
> o do a better job choosing when to invalidate a plan and cause another
> optimization try.  Currently we just use a row count metric and a number
> of iterations metric, and always when a ddl operation affects one of the
> tables.  Some options might be:
>     1) is it really necessary to recompile a single table query,
> non-join query other than at ddl?
>     2) can we use actual stats gathered during execution and compare
> them with estimates to see if we should change estimates and recompile?
>     3) Currently we recompile after a set number of iterations, should
> we bump the default number - or just get rid of it all together?
>     4) xml query plans
> o Update the costing information at least to current reality of the
> system (raw numbers have not been run for a few releases).  Maybe change
> units of costing so that other software products could provide costing.
> o add support in the optimizer to choose an index based on a function
> rather than a key (and add the function index support in language/store).
>
>
> Satheesh Bandaram wrote:
>
> > Right... I also remember seeing a large query (with 40+ table
> > references) where Derby optimizer was spending a lot of time figuring
> > out the right join-order. All the tables were really small in this case
> > and forcing a join-order saved optimization time and  improved total
> > query execution time significantly.
> >
> > The fact that most (if not all) database vendors have optimizer hints
> > should show its need ...
> >
> > Satheesh
> >
> > Mike Matrigali wrote:
> >
> >
> >>I don't have specific queries, but I can give examples where the
> >>optimizer may not have the right information to pick a good plan:
> >>
> >>1) queries involving tables with no indexes have no data distribution
> >>~   maintained by the system, so any guess the optimizer makes may be
> >>~   wrong for a dataset.
> >>2) queries involving indexes basically use the shape of the tree to
> >>~   estimate data distribution.  The cost information for an estimate
> >>~   of number of rows between key1 and key2, assumes uniform
distribution
> >>~   throughout the tree.
> >>3) because of #2, multi-column indexes (key1, key2) don't provide data
> >>~   distribution information on key2 - so again the optimizer just uses
> >>~   a default distribution guess.
> >>
> >>RPost wrote:
> >>| Do you have any specific examples of queries that could benefit from
an
> >>| optimizer hint?
> >>|
> >>| Does anyone know of any documented instances where Derby has selected
an
> >>| execution plan that is ineffective or suboptimal? Were these instances
> >>able
> >>| to be corrected by tuning the system using existing
> >>| properties?
> >>|
> >>| Oracle allows hints to be provided as SQL comments.
> >>|
> >>| Re providing an offline tool. Does this method imply that the
statements
> >>| exist independent of any given running database instance. If so are
you
> >>| proposing that a prepared statement be saved and made available for
> >>future
> >>| execution without preparing it again?
> >>|
> >>| ----- Original Message -----
> >>| From: "Satheesh Bandaram" <sa...@Sourcery.Org>
> >>| To: "Derby Development" <de...@db.apache.org>
> >>| Sent: Friday, December 17, 2004 2:41 PM
> >>| Subject: Optimizer hints?
> >>|
> >>|
> >>|
> >>| I have been thinking about adding optimizer hints to Derby. Though
Derby
> >>| optimizer does perform a reasonable job, it may be useful to have
> >>| optimizer hints for cases 1) when updated index statistics is not
> >>| available, or even incorrect 2) Rare cases, like this one?, when the
> >>| optimizer doesn't do the job right 3) when applications issue queries
> >>| not designed for Derby (like very large number of tables).
> >>|
> >>| Derby optimizer is primarily tasked to handle 1) access method
(whether
> >>| to use table scan or index) 2) join order and 3) join strategy (nested
> >>| loop or HashJoin) A complete optimizer hints (should they actually be
> >>| called optimizer overrides?) should be able to provide means of
> >>| specifying all these. There are several ways to do this, including
> >>|
> >>|    1. Enhance the SQL to recognize additional keywords, like
properties.
> >>| These properties could specify optimizer hints. That is what
Cloudscape
> >>| had before, but this causes non-portable SQL to be written. Attempts
to
> >>| run these statements against any other SQL server could cause syntax
> >>| errors. Not an ideal solution, according to me.
> >>|    2. Provide optimizer hints as SQL comments. These comments are
> >>| recognized only by Derby parser and other SQL engines would simply
> >>| ignore these. There are some limitations in Derby parser currently to
> >>| implement this option.
> >>|    3. Provide an offline tool which registers hints for SQL
statements.
> >>| When those SQL statements are executed, Derby could look up previously
> >>| registered hints for those statements and automatically apply them.
> >>| Advantage of this scheme is that applications don't need to be
modified
> >>| to add hints and they can easily be removed when not needed. How
exactly
> >>| to register the hints could be worked out. I personally prefer this
> >>| approach.
> >>|
> >>| Any comments?
> >>|
> >>| Satheesh
> >>|
> >>| Gerald Khin (JIRA) wrote:
> >>|
> >>|
> >>|>HashJoinStrategy leads to java.lang.OutOfMemoryError
> >>|>----------------------------------------------------
> >>|
> >>|>        Key: DERBY-106
> >>|>        URL: http://nagoya.apache.org/jira/browse/DERBY-106
> >>|>    Project: Derby
> >>|>       Type: Bug
> >>|>   Reporter: Gerald Khin
> >>|
> >>|
> >>|>My application is running out of memory: I encounterd a
> >>|
> >>| java.lang.OutOfMemoryError. I used -Xmx256M. Unfortunatley, I cannot
> >>| spend an arbitrary amount of JVM memory.
> >>|
> >>|>Then, I commented out the line in class OptimizerFactoryImpl which was
> >>|
> >>| adding the HashJoinStrategy to the set of Join strategies:
> >>|
> >>|>       if (joinStrategySet == null)
> >>|>       {
> >>|>//            JoinStrategy[] jss = new JoinStrategy[2];
> >>|>           JoinStrategy[] jss = new JoinStrategy[1];
> >>|>           jss[0] = new NestedLoopJoinStrategy();
> >>|>//            jss[1] = new HashJoinStrategy();
> >>|>           joinStrategySet = jss;
> >>|>       }
> >>|
> >>|>And with these changes the OutOfMemoryError has gone away! And it
works
> >>|
> >>| even with -Xmx128M!!!
> >>|
> >>|>So I guess that there is a major memory issue with this HashJoin
> >>|
> >>| strategy implementation.
> >>|
> >>|>If it turns out to be too complicated to make the memory consumption
> >>|
> >>| more predicatble or even bounded to some configurable limit, then I
need
> >>| at least as a workaround a way to turn off the HashJoin strategy
> >>| completely: I did it by patching and building my own derby.jar, but if
> >>| there would be an official solution with some kind of switch like a
> >>| system property, it would be great!
> >>|
> >>|
> >>|
> >
> >
> >


Re: Optimizer hints?

Posted by Mike Matrigali <mi...@sbcglobal.net>.
All the database systems I have worked on have had optimizer hints,
while I agree it would be best if we didn't have to ever have them I
think they are a good way to be able to provide a quick fix while we
improve the optimizer to not need them.  I agree with all the posts that
encourage people to provide as much detail as possible to their problems
with query plans.

I think it is a good idea to somehow introduce the hints such that it
does not affect the standard SQL.  Seems like it might be nice to make
it some sort of XML that might fit with an eventual XML output of query
plans so that tools could read and write them.

As a zero admin db derby should have the goal of making the hints not
necessary, but in the real world we may not get there - so I see them as
a necessary evil.  I also think it makes supporting the product easier
as one can force a test case when maybe the customers data that caused
that particular query plan shape is not available.

The area of the optimizer may provide some interesting work for those
interested out there.  I am not an expert in the optimizer but here are
some areas I have thought may be interesting:

o add a sort merge option to the optimizer choices
o do a better job choosing when to invalidate a plan and cause another
optimization try.  Currently we just use a row count metric and a number
of iterations metric, and always when a ddl operation affects one of the
tables.  Some options might be:
    1) is it really necessary to recompile a single table query,
non-join query other than at ddl?
    2) can we use actual stats gathered during execution and compare
them with estimates to see if we should change estimates and recompile?
    3) Currently we recompile after a set number of iterations, should
we bump the default number - or just get rid of it all together?
    4) xml query plans
o Update the costing information at least to current reality of the
system (raw numbers have not been run for a few releases).  Maybe change
units of costing so that other software products could provide costing.
o add support in the optimizer to choose an index based on a function
rather than a key (and add the function index support in language/store).


Satheesh Bandaram wrote:

> Right... I also remember seeing a large query (with 40+ table
> references) where Derby optimizer was spending a lot of time figuring
> out the right join-order. All the tables were really small in this case
> and forcing a join-order saved optimization time and  improved total
> query execution time significantly.
> 
> The fact that most (if not all) database vendors have optimizer hints
> should show its need ...
> 
> Satheesh
> 
> Mike Matrigali wrote:
> 
> 
>>I don't have specific queries, but I can give examples where the
>>optimizer may not have the right information to pick a good plan:
>>
>>1) queries involving tables with no indexes have no data distribution
>>~   maintained by the system, so any guess the optimizer makes may be
>>~   wrong for a dataset.
>>2) queries involving indexes basically use the shape of the tree to
>>~   estimate data distribution.  The cost information for an estimate
>>~   of number of rows between key1 and key2, assumes uniform distribution
>>~   throughout the tree.
>>3) because of #2, multi-column indexes (key1, key2) don't provide data
>>~   distribution information on key2 - so again the optimizer just uses
>>~   a default distribution guess.
>>
>>RPost wrote:
>>| Do you have any specific examples of queries that could benefit from an
>>| optimizer hint?
>>|
>>| Does anyone know of any documented instances where Derby has selected an
>>| execution plan that is ineffective or suboptimal? Were these instances
>>able
>>| to be corrected by tuning the system using existing
>>| properties?
>>|
>>| Oracle allows hints to be provided as SQL comments.
>>|
>>| Re providing an offline tool. Does this method imply that the statements
>>| exist independent of any given running database instance. If so are you
>>| proposing that a prepared statement be saved and made available for
>>future
>>| execution without preparing it again?
>>|
>>| ----- Original Message -----
>>| From: "Satheesh Bandaram" <sa...@Sourcery.Org>
>>| To: "Derby Development" <de...@db.apache.org>
>>| Sent: Friday, December 17, 2004 2:41 PM
>>| Subject: Optimizer hints?
>>|
>>|
>>|
>>| I have been thinking about adding optimizer hints to Derby. Though Derby
>>| optimizer does perform a reasonable job, it may be useful to have
>>| optimizer hints for cases 1) when updated index statistics is not
>>| available, or even incorrect 2) Rare cases, like this one?, when the
>>| optimizer doesn't do the job right 3) when applications issue queries
>>| not designed for Derby (like very large number of tables).
>>|
>>| Derby optimizer is primarily tasked to handle 1) access method (whether
>>| to use table scan or index) 2) join order and 3) join strategy (nested
>>| loop or HashJoin) A complete optimizer hints (should they actually be
>>| called optimizer overrides?) should be able to provide means of
>>| specifying all these. There are several ways to do this, including
>>|
>>|    1. Enhance the SQL to recognize additional keywords, like properties.
>>| These properties could specify optimizer hints. That is what Cloudscape
>>| had before, but this causes non-portable SQL to be written. Attempts to
>>| run these statements against any other SQL server could cause syntax
>>| errors. Not an ideal solution, according to me.
>>|    2. Provide optimizer hints as SQL comments. These comments are
>>| recognized only by Derby parser and other SQL engines would simply
>>| ignore these. There are some limitations in Derby parser currently to
>>| implement this option.
>>|    3. Provide an offline tool which registers hints for SQL statements.
>>| When those SQL statements are executed, Derby could look up previously
>>| registered hints for those statements and automatically apply them.
>>| Advantage of this scheme is that applications don't need to be modified
>>| to add hints and they can easily be removed when not needed. How exactly
>>| to register the hints could be worked out. I personally prefer this
>>| approach.
>>|
>>| Any comments?
>>|
>>| Satheesh
>>|
>>| Gerald Khin (JIRA) wrote:
>>|
>>|
>>|>HashJoinStrategy leads to java.lang.OutOfMemoryError
>>|>----------------------------------------------------
>>|
>>|>        Key: DERBY-106
>>|>        URL: http://nagoya.apache.org/jira/browse/DERBY-106
>>|>    Project: Derby
>>|>       Type: Bug
>>|>   Reporter: Gerald Khin
>>|
>>|
>>|>My application is running out of memory: I encounterd a
>>|
>>| java.lang.OutOfMemoryError. I used -Xmx256M. Unfortunatley, I cannot
>>| spend an arbitrary amount of JVM memory.
>>|
>>|>Then, I commented out the line in class OptimizerFactoryImpl which was
>>|
>>| adding the HashJoinStrategy to the set of Join strategies:
>>|
>>|>       if (joinStrategySet == null)
>>|>       {
>>|>//            JoinStrategy[] jss = new JoinStrategy[2];
>>|>           JoinStrategy[] jss = new JoinStrategy[1];
>>|>           jss[0] = new NestedLoopJoinStrategy();
>>|>//            jss[1] = new HashJoinStrategy();
>>|>           joinStrategySet = jss;
>>|>       }
>>|
>>|>And with these changes the OutOfMemoryError has gone away! And it works
>>|
>>| even with -Xmx128M!!!
>>|
>>|>So I guess that there is a major memory issue with this HashJoin
>>|
>>| strategy implementation.
>>|
>>|>If it turns out to be too complicated to make the memory consumption
>>|
>>| more predicatble or even bounded to some configurable limit, then I need
>>| at least as a workaround a way to turn off the HashJoin strategy
>>| completely: I did it by patching and building my own derby.jar, but if
>>| there would be an official solution with some kind of switch like a
>>| system property, it would be great!
>>|
>>|
>>|
> 
> 
> 

Re: Optimizer hints?

Posted by Satheesh Bandaram <sa...@Sourcery.Org>.
Right... I also remember seeing a large query (with 40+ table
references) where Derby optimizer was spending a lot of time figuring
out the right join-order. All the tables were really small in this case
and forcing a join-order saved optimization time and  improved total
query execution time significantly.

The fact that most (if not all) database vendors have optimizer hints
should show its need ...

Satheesh

Mike Matrigali wrote:

> I don't have specific queries, but I can give examples where the
> optimizer may not have the right information to pick a good plan:
>
> 1) queries involving tables with no indexes have no data distribution
> ~   maintained by the system, so any guess the optimizer makes may be
> ~   wrong for a dataset.
> 2) queries involving indexes basically use the shape of the tree to
> ~   estimate data distribution.  The cost information for an estimate
> ~   of number of rows between key1 and key2, assumes uniform distribution
> ~   throughout the tree.
> 3) because of #2, multi-column indexes (key1, key2) don't provide data
> ~   distribution information on key2 - so again the optimizer just uses
> ~   a default distribution guess.
>
> RPost wrote:
> | Do you have any specific examples of queries that could benefit from an
> | optimizer hint?
> |
> | Does anyone know of any documented instances where Derby has selected an
> | execution plan that is ineffective or suboptimal? Were these instances
> able
> | to be corrected by tuning the system using existing
> | properties?
> |
> | Oracle allows hints to be provided as SQL comments.
> |
> | Re providing an offline tool. Does this method imply that the statements
> | exist independent of any given running database instance. If so are you
> | proposing that a prepared statement be saved and made available for
> future
> | execution without preparing it again?
> |
> | ----- Original Message -----
> | From: "Satheesh Bandaram" <sa...@Sourcery.Org>
> | To: "Derby Development" <de...@db.apache.org>
> | Sent: Friday, December 17, 2004 2:41 PM
> | Subject: Optimizer hints?
> |
> |
> |
> | I have been thinking about adding optimizer hints to Derby. Though Derby
> | optimizer does perform a reasonable job, it may be useful to have
> | optimizer hints for cases 1) when updated index statistics is not
> | available, or even incorrect 2) Rare cases, like this one?, when the
> | optimizer doesn't do the job right 3) when applications issue queries
> | not designed for Derby (like very large number of tables).
> |
> | Derby optimizer is primarily tasked to handle 1) access method (whether
> | to use table scan or index) 2) join order and 3) join strategy (nested
> | loop or HashJoin) A complete optimizer hints (should they actually be
> | called optimizer overrides?) should be able to provide means of
> | specifying all these. There are several ways to do this, including
> |
> |    1. Enhance the SQL to recognize additional keywords, like properties.
> | These properties could specify optimizer hints. That is what Cloudscape
> | had before, but this causes non-portable SQL to be written. Attempts to
> | run these statements against any other SQL server could cause syntax
> | errors. Not an ideal solution, according to me.
> |    2. Provide optimizer hints as SQL comments. These comments are
> | recognized only by Derby parser and other SQL engines would simply
> | ignore these. There are some limitations in Derby parser currently to
> | implement this option.
> |    3. Provide an offline tool which registers hints for SQL statements.
> | When those SQL statements are executed, Derby could look up previously
> | registered hints for those statements and automatically apply them.
> | Advantage of this scheme is that applications don't need to be modified
> | to add hints and they can easily be removed when not needed. How exactly
> | to register the hints could be worked out. I personally prefer this
> | approach.
> |
> | Any comments?
> |
> | Satheesh
> |
> | Gerald Khin (JIRA) wrote:
> |
> |
> |>HashJoinStrategy leads to java.lang.OutOfMemoryError
> |>----------------------------------------------------
> |
> |>        Key: DERBY-106
> |>        URL: http://nagoya.apache.org/jira/browse/DERBY-106
> |>    Project: Derby
> |>       Type: Bug
> |>   Reporter: Gerald Khin
> |
> |
> |>My application is running out of memory: I encounterd a
> |
> | java.lang.OutOfMemoryError. I used -Xmx256M. Unfortunatley, I cannot
> | spend an arbitrary amount of JVM memory.
> |
> |>Then, I commented out the line in class OptimizerFactoryImpl which was
> |
> | adding the HashJoinStrategy to the set of Join strategies:
> |
> |>       if (joinStrategySet == null)
> |>       {
> |>//            JoinStrategy[] jss = new JoinStrategy[2];
> |>           JoinStrategy[] jss = new JoinStrategy[1];
> |>           jss[0] = new NestedLoopJoinStrategy();
> |>//            jss[1] = new HashJoinStrategy();
> |>           joinStrategySet = jss;
> |>       }
> |
> |>And with these changes the OutOfMemoryError has gone away! And it works
> |
> | even with -Xmx128M!!!
> |
> |>So I guess that there is a major memory issue with this HashJoin
> |
> | strategy implementation.
> |
> |>If it turns out to be too complicated to make the memory consumption
> |
> | more predicatble or even bounded to some configurable limit, then I need
> | at least as a workaround a way to turn off the HashJoin strategy
> | completely: I did it by patching and building my own derby.jar, but if
> | there would be an official solution with some kind of switch like a
> | system property, it would be great!
> |
> |
> |


Re: Optimizer hints?

Posted by Mike Matrigali <mi...@sbcglobal.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I don't have specific queries, but I can give examples where the
optimizer may not have the right information to pick a good plan:

1) queries involving tables with no indexes have no data distribution
~   maintained by the system, so any guess the optimizer makes may be
~   wrong for a dataset.
2) queries involving indexes basically use the shape of the tree to
~   estimate data distribution.  The cost information for an estimate
~   of number of rows between key1 and key2, assumes uniform distribution
~   throughout the tree.
3) because of #2, multi-column indexes (key1, key2) don't provide data
~   distribution information on key2 - so again the optimizer just uses
~   a default distribution guess.

RPost wrote:
| Do you have any specific examples of queries that could benefit from an
| optimizer hint?
|
| Does anyone know of any documented instances where Derby has selected an
| execution plan that is ineffective or suboptimal? Were these instances
able
| to be corrected by tuning the system using existing
| properties?
|
| Oracle allows hints to be provided as SQL comments.
|
| Re providing an offline tool. Does this method imply that the statements
| exist independent of any given running database instance. If so are you
| proposing that a prepared statement be saved and made available for future
| execution without preparing it again?
|
| ----- Original Message -----
| From: "Satheesh Bandaram" <sa...@Sourcery.Org>
| To: "Derby Development" <de...@db.apache.org>
| Sent: Friday, December 17, 2004 2:41 PM
| Subject: Optimizer hints?
|
|
|
| I have been thinking about adding optimizer hints to Derby. Though Derby
| optimizer does perform a reasonable job, it may be useful to have
| optimizer hints for cases 1) when updated index statistics is not
| available, or even incorrect 2) Rare cases, like this one?, when the
| optimizer doesn't do the job right 3) when applications issue queries
| not designed for Derby (like very large number of tables).
|
| Derby optimizer is primarily tasked to handle 1) access method (whether
| to use table scan or index) 2) join order and 3) join strategy (nested
| loop or HashJoin) A complete optimizer hints (should they actually be
| called optimizer overrides?) should be able to provide means of
| specifying all these. There are several ways to do this, including
|
|    1. Enhance the SQL to recognize additional keywords, like properties.
| These properties could specify optimizer hints. That is what Cloudscape
| had before, but this causes non-portable SQL to be written. Attempts to
| run these statements against any other SQL server could cause syntax
| errors. Not an ideal solution, according to me.
|    2. Provide optimizer hints as SQL comments. These comments are
| recognized only by Derby parser and other SQL engines would simply
| ignore these. There are some limitations in Derby parser currently to
| implement this option.
|    3. Provide an offline tool which registers hints for SQL statements.
| When those SQL statements are executed, Derby could look up previously
| registered hints for those statements and automatically apply them.
| Advantage of this scheme is that applications don't need to be modified
| to add hints and they can easily be removed when not needed. How exactly
| to register the hints could be worked out. I personally prefer this
| approach.
|
| Any comments?
|
| Satheesh
|
| Gerald Khin (JIRA) wrote:
|
|
|>HashJoinStrategy leads to java.lang.OutOfMemoryError
|>----------------------------------------------------
|
|>        Key: DERBY-106
|>        URL: http://nagoya.apache.org/jira/browse/DERBY-106
|>    Project: Derby
|>       Type: Bug
|>   Reporter: Gerald Khin
|
|
|>My application is running out of memory: I encounterd a
|
| java.lang.OutOfMemoryError. I used -Xmx256M. Unfortunatley, I cannot
| spend an arbitrary amount of JVM memory.
|
|>Then, I commented out the line in class OptimizerFactoryImpl which was
|
| adding the HashJoinStrategy to the set of Join strategies:
|
|>       if (joinStrategySet == null)
|>       {
|>//            JoinStrategy[] jss = new JoinStrategy[2];
|>           JoinStrategy[] jss = new JoinStrategy[1];
|>           jss[0] = new NestedLoopJoinStrategy();
|>//            jss[1] = new HashJoinStrategy();
|>           joinStrategySet = jss;
|>       }
|
|>And with these changes the OutOfMemoryError has gone away! And it works
|
| even with -Xmx128M!!!
|
|>So I guess that there is a major memory issue with this HashJoin
|
| strategy implementation.
|
|>If it turns out to be too complicated to make the memory consumption
|
| more predicatble or even bounded to some configurable limit, then I need
| at least as a workaround a way to turn off the HashJoin strategy
| completely: I did it by patching and building my own derby.jar, but if
| there would be an official solution with some kind of switch like a
| system property, it would be great!
|
|
|
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBx3EOEpeslyHqPs0RAk9lAKC9q5QeYyb/lgCnT5/9eyV/ygL4xACfUnfx
v4P9ttZUNaejoXSs6d5tPCg=
=2gpL
-----END PGP SIGNATURE-----

Re: Optimizer hints?

Posted by RPost <rp...@pacbell.net>.
Do you have any specific examples of queries that could benefit from an
optimizer hint?

Does anyone know of any documented instances where Derby has selected an
execution plan that is ineffective or suboptimal? Were these instances able
to be corrected by tuning the system using existing
properties?

Oracle allows hints to be provided as SQL comments.

Re providing an offline tool. Does this method imply that the statements
exist independent of any given running database instance. If so are you
proposing that a prepared statement be saved and made available for future
execution without preparing it again?

----- Original Message ----- 
From: "Satheesh Bandaram" <sa...@Sourcery.Org>
To: "Derby Development" <de...@db.apache.org>
Sent: Friday, December 17, 2004 2:41 PM
Subject: Optimizer hints?


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I have been thinking about adding optimizer hints to Derby. Though Derby
> optimizer does perform a reasonable job, it may be useful to have
> optimizer hints for cases 1) when updated index statistics is not
> available, or even incorrect 2) Rare cases, like this one?, when the
> optimizer doesn't do the job right 3) when applications issue queries
> not designed for Derby (like very large number of tables).
>
> Derby optimizer is primarily tasked to handle 1) access method (whether
> to use table scan or index) 2) join order and 3) join strategy (nested
> loop or HashJoin) A complete optimizer hints (should they actually be
> called optimizer overrides?) should be able to provide means of
> specifying all these. There are several ways to do this, including
>
>    1. Enhance the SQL to recognize additional keywords, like properties.
> These properties could specify optimizer hints. That is what Cloudscape
> had before, but this causes non-portable SQL to be written. Attempts to
> run these statements against any other SQL server could cause syntax
> errors. Not an ideal solution, according to me.
>    2. Provide optimizer hints as SQL comments. These comments are
> recognized only by Derby parser and other SQL engines would simply
> ignore these. There are some limitations in Derby parser currently to
> implement this option.
>    3. Provide an offline tool which registers hints for SQL statements.
> When those SQL statements are executed, Derby could look up previously
> registered hints for those statements and automatically apply them.
> Advantage of this scheme is that applications don't need to be modified
> to add hints and they can easily be removed when not needed. How exactly
> to register the hints could be worked out. I personally prefer this
> approach.
>
> Any comments?
>
> Satheesh
>
> Gerald Khin (JIRA) wrote:
>
> >HashJoinStrategy leads to java.lang.OutOfMemoryError
> >----------------------------------------------------
> >
> >         Key: DERBY-106
> >         URL: http://nagoya.apache.org/jira/browse/DERBY-106
> >     Project: Derby
> >        Type: Bug
> >    Reporter: Gerald Khin
> >
> >
> >My application is running out of memory: I encounterd a
> java.lang.OutOfMemoryError. I used -Xmx256M. Unfortunatley, I cannot
> spend an arbitrary amount of JVM memory.
> >
> >Then, I commented out the line in class OptimizerFactoryImpl which was
> adding the HashJoinStrategy to the set of Join strategies:
> >
> >        if (joinStrategySet == null)
> >        {
> >//            JoinStrategy[] jss = new JoinStrategy[2];
> >            JoinStrategy[] jss = new JoinStrategy[1];
> >            jss[0] = new NestedLoopJoinStrategy();
> >//            jss[1] = new HashJoinStrategy();
> >            joinStrategySet = jss;
> >        }
> >
> >And with these changes the OutOfMemoryError has gone away! And it works
> even with -Xmx128M!!!
> >
> >So I guess that there is a major memory issue with this HashJoin
> strategy implementation.
> >
> >If it turns out to be too complicated to make the memory consumption
> more predicatble or even bounded to some configurable limit, then I need
> at least as a workaround a way to turn off the HashJoin strategy
> completely: I did it by patching and building my own derby.jar, but if
> there would be an official solution with some kind of switch like a
> system property, it would be great!
> >
> >
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (MingW32)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>
> iD8DBQFBw2ClENVNIY6DZ7ERAtofAJ0Sgt4OnJPawnVlF22hI7+y7wZyvwCeKjOw
> aD0AJPCaifS2ZEq50cU5TWk=
> =Bq8A
> -----END PGP SIGNATURE-----
>


Re: Optimizer hints?

Posted by Mike Matrigali <mi...@sbcglobal.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Currently derby does not use stats for data distribution, it uses the
existing indexes.

Dibyendu Majumdar wrote:
| Hi, I would agree with you that the third option is best.
|
|
|>   3. Provide an offline tool which registers hints for SQL statements.
|>When those SQL statements are executed, Derby could look up previously
|>registered hints for those statements and automatically apply them.
|>Advantage of this scheme is that applications don't need to be modified
|>to add hints and they can easily be removed when not needed. How exactly
|>to register the hints could be worked out. I personally prefer this
|>approach.
|
|
| How about manipulating the stats used by the optimizer and then
storing the
| execution plan in a table? This is also a technique that is available in
| Oracle.
|
| Regards
|
|
|
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBx3FUEpeslyHqPs0RAurWAKDSQJiZcRhobiQDylkQhKq14ZuewQCdGxXX
Vp15ZBAXsMuqAb2FZiHuROU=
=9oIy
-----END PGP SIGNATURE-----

Re: Optimizer hints?

Posted by Dibyendu Majumdar <di...@mazumdar.demon.co.uk>.
Hi, I would agree with you that the third option is best.

>    3. Provide an offline tool which registers hints for SQL statements.
> When those SQL statements are executed, Derby could look up previously
> registered hints for those statements and automatically apply them.
> Advantage of this scheme is that applications don't need to be modified
> to add hints and they can easily be removed when not needed. How exactly
> to register the hints could be worked out. I personally prefer this
> approach.

How about manipulating the stats used by the optimizer and then storing the
execution plan in a table? This is also a technique that is available in
Oracle.

Regards



Re: Optimizer hints?

Posted by Rajesh Kartha <ka...@Source-Zone.Org>.
I think (3) would be really useful.  The tool can ingest hints in the 
form of 'properties' or XML and let Derby know
accordingly what method/join to use for the execution of a particular SQL.

-Rajesh



Satheesh Bandaram wrote:

>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
> 
>I have been thinking about adding optimizer hints to Derby. Though Derby
>optimizer does perform a reasonable job, it may be useful to have
>optimizer hints for cases 1) when updated index statistics is not
>available, or even incorrect 2) Rare cases, like this one?, when the
>optimizer doesn't do the job right 3) when applications issue queries
>not designed for Derby (like very large number of tables).
>
>Derby optimizer is primarily tasked to handle 1) access method (whether
>to use table scan or index) 2) join order and 3) join strategy (nested
>loop or HashJoin) A complete optimizer hints (should they actually be
>called optimizer overrides?) should be able to provide means of
>specifying all these. There are several ways to do this, including
>
>   1. Enhance the SQL to recognize additional keywords, like properties.
>These properties could specify optimizer hints. That is what Cloudscape
>had before, but this causes non-portable SQL to be written. Attempts to
>run these statements against any other SQL server could cause syntax
>errors. Not an ideal solution, according to me.
>   2. Provide optimizer hints as SQL comments. These comments are
>recognized only by Derby parser and other SQL engines would simply
>ignore these. There are some limitations in Derby parser currently to
>implement this option.
>   3. Provide an offline tool which registers hints for SQL statements.
>When those SQL statements are executed, Derby could look up previously
>registered hints for those statements and automatically apply them.
>Advantage of this scheme is that applications don't need to be modified
>to add hints and they can easily be removed when not needed. How exactly
>to register the hints could be worked out. I personally prefer this
>approach.
>
>Any comments?
>
>Satheesh
>
>Gerald Khin (JIRA) wrote:
>
>  
>
>>HashJoinStrategy leads to java.lang.OutOfMemoryError
>>----------------------------------------------------
>>
>>        Key: DERBY-106
>>        URL: http://nagoya.apache.org/jira/browse/DERBY-106
>>    Project: Derby
>>       Type: Bug
>>   Reporter: Gerald Khin
>>
>>
>>My application is running out of memory: I encounterd a
>>    
>>
>java.lang.OutOfMemoryError. I used -Xmx256M. Unfortunatley, I cannot
>spend an arbitrary amount of JVM memory.
>  
>
>>Then, I commented out the line in class OptimizerFactoryImpl which was
>>    
>>
>adding the HashJoinStrategy to the set of Join strategies:
>  
>
>>       if (joinStrategySet == null)
>>       {
>>//            JoinStrategy[] jss = new JoinStrategy[2];
>>           JoinStrategy[] jss = new JoinStrategy[1];
>>           jss[0] = new NestedLoopJoinStrategy();
>>//            jss[1] = new HashJoinStrategy();
>>           joinStrategySet = jss;
>>       }
>>
>>And with these changes the OutOfMemoryError has gone away! And it works
>>    
>>
>even with -Xmx128M!!!
>  
>
>>So I guess that there is a major memory issue with this HashJoin
>>    
>>
>strategy implementation.
>  
>
>>If it turns out to be too complicated to make the memory consumption
>>    
>>
>more predicatble or even bounded to some configurable limit, then I need
>at least as a workaround a way to turn off the HashJoin strategy
>completely: I did it by patching and building my own derby.jar, but if
>there would be an official solution with some kind of switch like a
>system property, it would be great!
>  
>
>>
>>    
>>
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.2.5 (MingW32)
>Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
> 
>iD8DBQFBw2ClENVNIY6DZ7ERAtofAJ0Sgt4OnJPawnVlF22hI7+y7wZyvwCeKjOw
>aD0AJPCaifS2ZEq50cU5TWk=
>=Bq8A
>-----END PGP SIGNATURE-----
>
>  
>