You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "A B (JIRA)" <de...@db.apache.org> on 2006/03/03 16:45:12 UTC

[jira] Created: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
-----------------------------------------------------------------------------------------------------------------------

         Key: DERBY-1073
         URL: http://issues.apache.org/jira/browse/DERBY-1073
     Project: Derby
        Type: Sub-task
  Components: Performance  
    Versions: 10.2.0.0    
    Reporter: A B
    Priority: Minor


I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.

[ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]

If we have a query such as:

select <...> from
   (select t1.i, t2.j from t1, t2 where <...>) X1,
    T3
where <...>

then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".

In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.

That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.

Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.

This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.

With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.

So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-1073?page=all ]

A B updated DERBY-1073:
-----------------------

    Attachment: d1073_v1.patch

Attaching a patch, d1073_v1.patch, that adds logic to check to see if an OptimizerImpl has predicates that have been pushed from outer queries and, if so, resets the timeout state for that OptimizerImpl.  Otherwise, since the subquery's "best plan" won't change from the previous round, we leave the timeout state as it is.

This approach allows the optimizer to consider plans that use pushed predicates, but at the same time it ensures that queries for which predicate pushdown is not an option will still timeout as they've done in the past.

Note that until DERBY-805 is fully committed, this change won't take effect (because we don't currently push any predicates during optimization).  Note also that there's not (to my knowledge) a good way to write a test for this because optimizer timeout is machine-dependent and thus a test that demonstrates the issue on one machine might not do so on another. 

I've run derbylang on Windows 2000 with IBM 1.4.2 and saw no failures.  I actually tried running the full derbyall but my firewall blocked the server tests and thus the only tests which completed were the derbylang ones.  I'll try to run derbyall with these changes when possible.

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-1073
>          URL: http://issues.apache.org/jira/browse/DERBY-1073
>      Project: Derby
>         Type: Sub-task
>   Components: Performance
>     Versions: 10.2.0.0
>     Reporter: A B
>     Assignee: A B
>     Priority: Minor
>  Attachments: d1073_v1.patch
>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Assigned: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-1073?page=all ]

A B reassigned DERBY-1073:
--------------------------

    Assign To: A B

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-1073
>          URL: http://issues.apache.org/jira/browse/DERBY-1073
>      Project: Derby
>         Type: Sub-task
>   Components: Performance
>     Versions: 10.2.0.0
>     Reporter: A B
>     Assignee: A B
>     Priority: Minor

>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "Satheesh Bandaram (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-1073?page=all ]

Satheesh Bandaram updated DERBY-1073:
-------------------------------------

    Other Info:   (was: [Patch available])

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-1073
>          URL: http://issues.apache.org/jira/browse/DERBY-1073
>      Project: Derby
>         Type: Sub-task
>   Components: Performance
>     Versions: 10.2.0.0
>     Reporter: A B
>     Assignee: A B
>     Priority: Minor
>  Attachments: d1073_v1.patch, d1073_v2.patch
>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-1073?page=all ]

A B updated DERBY-1073:
-----------------------

    Attachment: d1073_v2.patch

Posting an updated patch based on the recent checkins for DERBY-805 and DERBY-1007.  This d1073_v2.patch does the exact same thing that d1073_v1.patch does; it's just been updated to apply cleanly with the latest trunk.

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-1073
>          URL: http://issues.apache.org/jira/browse/DERBY-1073
>      Project: Derby
>         Type: Sub-task
>   Components: Performance
>     Versions: 10.2.0.0
>     Reporter: A B
>     Assignee: A B
>     Priority: Minor
>  Attachments: d1073_v1.patch, d1073_v2.patch
>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-1073?page=all ]

A B updated DERBY-1073:
-----------------------

    Other Info: [Patch available]

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-1073
>          URL: http://issues.apache.org/jira/browse/DERBY-1073
>      Project: Derby
>         Type: Sub-task
>   Components: Performance
>     Versions: 10.2.0.0
>     Reporter: A B
>     Assignee: A B
>     Priority: Minor
>  Attachments: d1073_v1.patch
>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-1073?page=all ]
     
A B resolved DERBY-1073:
------------------------

    Fix Version: 10.2.0.0
                 10.1.2.5
                 10.1.2.4
     Resolution: Fixed

Timeout fix was checked into 10.1 with svn 397682, so I'm marking this issue as resolved.  Thanks for the commits, Satheesh.

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-1073
>          URL: http://issues.apache.org/jira/browse/DERBY-1073
>      Project: Derby
>         Type: Sub-task

>   Components: Performance
>     Versions: 10.2.0.0
>     Reporter: A B
>     Assignee: A B
>     Priority: Minor
>      Fix For: 10.2.0.0, 10.1.2.4, 10.1.2.5
>  Attachments: d1073_timeoutFix_v1.patch, d1073_v1.patch, d1073_v2.patch
>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Closed: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-1073?page=all ]
     
A B closed DERBY-1073:
----------------------


All changes committed to 10.1 and 10.2 codelines and no further issues reported, so closing.

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-1073
>          URL: http://issues.apache.org/jira/browse/DERBY-1073
>      Project: Derby
>         Type: Sub-task

>   Components: Performance
>     Versions: 10.2.0.0
>     Reporter: A B
>     Assignee: A B
>     Priority: Minor
>      Fix For: 10.2.0.0, 10.1.2.4, 10.1.2.5
>  Attachments: d1073_timeoutFix_v1.patch, d1073_v1.patch, d1073_v2.patch
>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "Satheesh Bandaram (JIRA)" <de...@db.apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-1073?page=comments#action_12369234 ] 

Satheesh Bandaram commented on DERBY-1073:
------------------------------------------

Patch submitted to trunk.

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-1073
>          URL: http://issues.apache.org/jira/browse/DERBY-1073
>      Project: Derby
>         Type: Sub-task
>   Components: Performance
>     Versions: 10.2.0.0
>     Reporter: A B
>     Assignee: A B
>     Priority: Minor
>  Attachments: d1073_v1.patch, d1073_v2.patch
>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-1073:
---------------------------------

    Derby Categories: [Performance]

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-1073
>                 URL: https://issues.apache.org/jira/browse/DERBY-1073
>             Project: Derby
>          Issue Type: Sub-task
>    Affects Versions: 10.2.1.6
>            Reporter: A B
>            Assignee: A B
>            Priority: Minor
>             Fix For: 10.1.3.1, 10.2.1.6
>
>         Attachments: d1073_timeoutFix_v1.patch, d1073_v1.patch, d1073_v2.patch
>
>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-1073?page=all ]

A B updated DERBY-1073:
-----------------------

    Attachment: d1073_timeoutFix_v1.patch

As described in the derby-dev thread here:

http://article.gmane.org/gmane.comp.apache.db.derby.devel/16723

the original patch for DERBY-1073 was incomplete.  For details see the above thread.  In short, the 'default' behavior for DERBY-1073 (i.e. the case where there are no pushed predicates) combined with the changes for DERBY-1007 can result in significantly greater compilation time for some queries--as evidenced by the 4 to 7 time slow-down in the regression tests for nist/dml132.sql.

The resolution to this problem is to take the changes for DERBY-1073 one step further by recognizing when a subquery's optimizer has timed out and, for each round thereafter, allowing the optimizer to continue until it finds it's first complete (and valid) plan for that round.  Before d1073_timeoutFix_v1.patch the optimizer for a subquery would return immediately (i.e. without doing any optimization) for every round after the initial timeout, which led to incorrect (and exceedingly high) cost estimates.

More simply put, the changes in d1073_timeoutFix_v1.patch delay subquery timeout until the optimizer for the subquery has found a legitimate cost to return.

In order to have a legitimate cost, the optimizer must find at least one complete join order.  Instead of just using the first possible join order, the changes in d1073_timeoutFix_v1.patch try to guess at what the best join order will be by "jumping" to the join order that was most recently deemed "best" for the subquery prior to timeout.  It then stops optimizing (times out) and returns the cost of that join order.

Note that with d1073_timeoutFix_v1.patch, the nist/dml132.sql test will still run more slowly than it did prior to DERBY-1007 and DERBY-1073.  But a) the slow-down will be far less than the 4 to 7-time slow-down seen in the current trunk, and b) the slow-down is actually the result of more correct behavior.  The reason this slow-down is more correct is because it comes from the fact that optimizer timeout value is now correct (roughly 35ish seconds) whereas prior to DERBY-1007 and DERBY-1073 the timeout value, which was only 14-ish seconds, was incorrect because the subquery that timed out was returning an illegitimate cost estimate that in turn led to an incorrect timeout value.

I ran derbyall with these changes against sane jars on Red Hat with IBM 1.4.2 and saw no new failures.

I would appreciate any review comments, if anyone has the time...Thanks.

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-1073
>          URL: http://issues.apache.org/jira/browse/DERBY-1073
>      Project: Derby
>         Type: Sub-task
>   Components: Performance
>     Versions: 10.2.0.0
>     Reporter: A B
>     Assignee: A B
>     Priority: Minor
>  Attachments: d1073_timeoutFix_v1.patch, d1073_v1.patch, d1073_v2.patch
>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (DERBY-1073) Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.

Posted by "A B (JIRA)" <de...@db.apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-1073?page=comments#action_12376834 ] 

A B commented on DERBY-1073:
----------------------------

Timeout fix was committed with svn revision 393608.  Patch to port these changes to 10.1 is attached to DERBY-805 and also posted here:

http://article.gmane.org/gmane.comp.apache.db.derby.devel/19330.

> Reset optimizer timeout for subqueries on a per-round basis to allow consideration of plans that use pushed predicates.
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-1073
>          URL: http://issues.apache.org/jira/browse/DERBY-1073
>      Project: Derby
>         Type: Sub-task

>   Components: Performance
>     Versions: 10.2.0.0
>     Reporter: A B
>     Assignee: A B
>     Priority: Minor
>  Attachments: d1073_timeoutFix_v1.patch, d1073_v1.patch, d1073_v2.patch
>
> I wanted to file this as subtask to DERBY-805, but since DERBY-805 is itself a subtask to DERBY-649, I was not able to do so.  So I'm creating this issue as a(nother) subtaks for DERBY-649.
> [ Based on derby-dev thread found here: http://article.gmane.org/gmane.comp.apache.db.derby.devel/16007 ]
> If we have a query such as:
> select <...> from
>    (select t1.i, t2.j from t1, t2 where <...>) X1,
>     T3
> where <...>
> then we would have one "outer" query and one "subquery".  The outer query would be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, t2".
> In this case the Derby optimizer will create two instances of OptimizerImpl: one for the outer query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each OptimizerImpl has its own timeout "clock" that it initializes at creation time--but never resets.  If timeout occurs, the OptimizerImpl will stop searching for "the" best plan and will just take the best plan found so far.
> That said, for every permutation of the outer query a call will be made to optimize the subquery.  To simplify things, let's assume there are only two permutations of the outer query: one with join order {X1, T3} and another with join order {T3, X1}.
> Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make a call to optimize the subquery represented by OI_SQ.  Let's further say that the subquery tries some permutation {T1, T2} and then times out.  It then returns the plan information for {T1, T2} to the outer query.  The outer query, which has *not* yet timed out, then decides to try its second permutation {T3, X1}.  So it again makes a call to optimize the subquery.  In this case, the subquery--which has already timed out--will *immediately* return without trying to optimize anything.  The outer query will then make a decision about its second permutation based on the un-optimized subquery's plan results.
> This hasn't really been an issue to date because the "best plan" chosen by the subquery is typically independent of the outer query's current permutation--with the exception of "outerCost", which is passed in from the outer query and is factored into the subquery's cost estimates.  Because of this relative independence, the plan chosen by the subquery would rarely (if ever?) change with different permutations of the outer query, so if the subquery timed out once there was no point in trying to re-optimize it again later.
> With DERBY-805, though, Derby has acquired the ability to push predicates from outer queries down into subqueries--which means that the outer join order can have a very significant impact on the plan chosen by the subquery.  But because the timeout mechanism is never reset, we could end up skipping the second optimization phase of the subquery, which means we never get a chance to see how much the outer predicates can help, and thus we could end up skipping over some plans that have the potential to give us significant performance improvement.
> So resolution of this issue would involve resetting the timeout state for subqueries to allow the Derby optimizer to consider plans that rely on pushed predicates.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira