You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "A B (JIRA)" <de...@db.apache.org> on 2006/04/08 02:24:24 UTC

[jira] Updated: (DERBY-805) Push join predicates into union and other set operations. DERBY-649 implemented scalar (single table) predicate pushdown. Adding join predicate push down could improve performance significantly.

     [ http://issues.apache.org/jira/browse/DERBY-805?page=all ]

A B updated DERBY-805:
----------------------

    Attachment: DERBY-805_v4.html
                d805_phase4_v1.patch
                d805_phase4_v1.stat

Attaching two things to this issue:

1 - DERBY-805_v4.html: an updated document that includes a description of my proposed changes for Phase 4.  See "Section VII".

2 - d805_phase4_v1.patch: a patch that implements the Phase 4 changes described in Section VII of the _v4.html document.

The phase4_v1 patch also fixes a small error in BaseTableNumberVisitor.java and updates the lang/predicatePushdown.sql test to account for the new Phase 4 behavior.  As part of the changes for predicatePushdown.sql I have added a lot more data to the tables, which means the test queries take longer to run.  So where this test used to run in under a minute on my machine, it now takes over 4 minutes to complete.  That's expected given my changes, but I thought I should state that explicitly in case anyone notices.

I ran derbyall on Linux Redhat using sane jars with IBM 1.4.2 and saw no new failures.  I also ran the updated lang/predicatePushdown.sql test against IBM wsdd5.6 and Sun 1.5 and the test passed without problems.

If anyone has time to review and commit, that would be great. Note that while the patch is around 5500 lines, only about 600 lines of it are actual code changes; the rest is the result of the updated predicatePushdown.sql test and the corresponding output (a lot of query plans are printed in this test, so the output is quite large).

> Push join predicates into union and other set operations. DERBY-649 implemented scalar (single table) predicate pushdown. Adding join predicate push down could improve performance significantly.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-805
>          URL: http://issues.apache.org/jira/browse/DERBY-805
>      Project: Derby
>         Type: Sub-task

>   Components: SQL
>     Versions: 10.1.2.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram
>     Assignee: A B
>      Fix For: 10.2.0.0
>  Attachments: DERBY-805.html, DERBY-805_v2.html, DERBY-805_v3.html, DERBY-805_v4.html, d805_phase1_v1.patch, d805_phase1_v1.stat, d805_phase1_v2.patch, d805_phase1_v2.stat, d805_phase1_v3.patch, d805_phase1_v3.stat, d805_phase2_v1.patch, d805_phase2_v1.stat, d805_phase3_v1.patch, d805_phase3_v1.stat, d805_phase4_v1.patch, d805_phase4_v1.stat, phase2_javadocFix.patch, predPushdown_testFix.patch
>
> Fix for DERBY-649 implemented scalar (single table) predicate push down into UNIONs. While this improves performance for one set of queries, ability to push join-predicates further improves Derby performance by enabling use of indices where possible.
> For example,
> create view V1 as select i, j from T1 union all select i,j from T2; 
> create view V2 as select a,b from T3 union all select a,b from T4; 
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5); 
> For a query like
> select * from V1, V2 where V1.j = V2.b and V1.i =1;
> If the join order choosen is V1,V2, V1 can use index on V1.i (if present) following fix for DERBY-649. But if there is a index on V2.b also, Derby currently can't use that index. By pushing join predicate, Derby would be able to use the index and improve performance. Some of the queries I have seen (not the one shown here...) could improve from 70-120 seconds to about one second.
> Note there is a good comment by Jeff Lichtman about join-predicate push down. I am copying parts of it here for completeness of this report: (Modified)
> If predicate push down is done during optimization, it would be possible to push joins into the union as long as it's in the right place in the join order.
> For example:
> create view v as select * from t1 union all select * from t2;
> select * from v, t3 where v.c1 = t3.c2;
> In this select, if t3 is the outer table then the qualification could be pushed into the union and optimized there, but if t3 is the inner table the qualification can't be pushed into the union.
> If the pushing is done at preprocess time (i.e. before optimization) it is impossible to know whether a join qualification like this can be safely pushed.
> There's a comment in UnionNode.optimizeIt() saying:
> /* RESOLVE - don't try to push predicated through for now */
> This is where I'd expect to see something for pushing predicates into the union during optimization.
> BTW, the business of pushing and pulling predicates during optimization can be hard to understand and debug, so maybe it's best to only handle the simple cases and do it during preprocessing.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Updated: (DERBY-805) Push join predicates into union and other set operations. DERBY-649 implemented scalar (single table) predicate pushdown. Adding join predicate push down could improve performance significantly.

Posted by Army <qo...@gmail.com>.

Andrew McIntyre wrote:
> 
> The only time the statistics for an index are guaranteed to be up to
> date is when the index is created. See:
> 
> https://issues.apache.org/jira/browse/DERBY-737
> https://issues.apache.org/jira/browse/DERBY-269
> 
> Creating the index before loading the table with data means that the
> statistics for the table will correspond to an empty table. The table
> needs to be loaded with data, and then the index created afterward.

Thanks for asking the question about indexes, Bryan.  And thanks to Andrew for 
answering the question and for providing Jira entries.  I've included the above 
text as a footnote in the latest html document for DERBY-805 (_v5.html).

And many thanks to both of you for taking the time to read this rather daunting 
document; I do appreciate the feedback...

Army

Re: [jira] Updated: (DERBY-805) Push join predicates into union and other set operations. DERBY-649 implemented scalar (single table) predicate pushdown. Adding join predicate push down could improve performance significantly.

Posted by Andrew McIntyre <mc...@gmail.com>.

On 4/7/06, Bryan Pendleton <bp...@amberpoint.com> wrote:
>
> I love reading your documents! Even if 85% of it is too
> optimizer-y for me to understand, I learn a lot each time I
> read through your work. So thanks!

I also find Army's work on the optimizer very interesting. The
documents attached to DERBY-805 have helped shed some light on what is
not a very well understood component of Derby.

> One particular sentence from your most recent writeup stuck out
> for me, as I had missed this in previous discussions about DERBY-805:
>
>     The second thing was the fact that I created indexes on T3 and
>     T4 _before_ inserting the data, which (it turns out) leads to
>     outdated (if not altogether incorrect) statistics for the indexes.
>
> Are the statistics always incorrect/outdated after inserting lots
> of data into a table which has one or more indexes?
>
> Or was there something specific about the way in which you inserted
> your data which caused the indexes to have outdated/incorrect statistics?

The only time the statistics for an index are guaranteed to be up to
date is when the index is created. See:

https://issues.apache.org/jira/browse/DERBY-737
https://issues.apache.org/jira/browse/DERBY-269

Creating the index before loading the table with data means that the
statistics for the table will correspond to an empty table. The table
needs to be loaded with data, and then the index created afterward.

andrew

Re: [jira] Updated: (DERBY-805) Push join predicates into union and other set operations. DERBY-649 implemented scalar (single table) predicate pushdown. Adding join predicate push down could improve performance significantly.

Posted by Bryan Pendleton <bp...@amberpoint.com>.

Hi Army,

I love reading your documents! Even if 85% of it is too
optimizer-y for me to understand, I learn a lot each time I
read through your work. So thanks!

One particular sentence from your most recent writeup stuck out
for me, as I had missed this in previous discussions about DERBY-805:

    The second thing was the fact that I created indexes on T3 and
    T4 _before_ inserting the data, which (it turns out) leads to
    outdated (if not altogether incorrect) statistics for the indexes.

Are the statistics always incorrect/outdated after inserting lots
of data into a table which has one or more indexes?

Or was there something specific about the way in which you inserted
your data which caused the indexes to have outdated/incorrect statistics?

thanks,

bryan