You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Satheesh Bandaram (JIRA)" <de...@db.apache.org> on 2005/12/23 04:17:32 UTC

[jira] Created: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
-------------------------------------------------------------------------------------------------------------

         Key: DERBY-781
         URL: http://issues.apache.org/jira/browse/DERBY-781
     Project: Derby
        Type: Improvement
  Components: SQL  
    Versions: 10.1.1.0, 10.2.0.0    
 Environment: generic
    Reporter: Satheesh Bandaram


Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.

For example:

create view V1 as select i, j from T1 union all select i,j from T2;
create view V2 as select a,b from T3 union all select a,b from T4;
insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);

For a query like select * from V1, V2 where i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.

Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.

public boolean performMaterialization(JBitSet outerTables)
		throws StandardException
{
	// RESOLVE - just say no to materialization right now - should be a cost based decision
	return false;

	/* Actual materialization, if appropriate, will be placed by our parent PRN.
	 * This is because PRN might have a join condition to apply.  (Materialization
	 * can only occur before that.
	 */
	//return true;
} 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "Kathey Marsden (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

Kathey Marsden updated DERBY-781:
---------------------------------

    Derby Info: [Patch Available, Existing Application Impact, Release Note Needed]  (was: [Patch Available, Release Note Needed])

There may be  existing application impact as applications may see increased compilation times but should see improved execution time performance.  Applications may want to adjust to use PreparedStatements instead of Statements  (always a good practice with Derby).

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.0.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "Bryan Pendleton (JIRA)" <de...@db.apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-781?page=comments#action_12421314 ] 
            
Bryan Pendleton commented on DERBY-781:
---------------------------------------

Hi Army,

Thanks again for the great writeup, and for putting the energy into clear comments and careful changes to the code. It really makes a huge difference when trying to read the code.

I've read through the writeup carefully, and checked it against the patch, and I have no comments or suggestions to make. The patch applied cleanly for me, built without problems, and lang/subquery.sql and lang/predicatesIntoviews.sql both passed in my environment. 

It looks like an excellent patch to me; I am +1 for commit.


> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.0.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B updated DERBY-781:
----------------------

    Derby Info: [Patch Available, Release Note Needed]  (was: [Patch Available])

Possible RELEASE NOTE for this fix is as follows, based on suggestions from Bryan in the above-referenced thread:

<begin_release_note>

DERBY-781: Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.

The Derby optimizer has been enhanced so that it now considers the cost of performing a hash join with subqueries when it is safe to do so.  If the cost of the hash join is better than a nested loop join, Derby will choose to do the hash join and will thereby materialize the subquery.

WHAT CHANGED

When optimizing a query that has one or more non-flattenable subqueries in the FROM clause, Derby will now check to see if it is possible to perform a hash join with that subquery as the inner table.  Prior to Derby 10.2, the optimizer would never consider a hash join with a subquery; it only did nested loop joins.

SYMPTOM

Execution performance of queries containing non-flattenable subqueries may change.  The expectation is that the new (10.2) query plans will show improved performance over the old ones.

Another potential symptom is that the compilation time for such queries may increase.  If this happens, the increase should only occur at compilation time; execution time should either improve or, at the very least, remain the same as in earlier versions of Derby.

CAUSE

If the optimizer chooses to do a hash join with a subquery, Derby only has to execute the subquery a single time per statement, after which Derby can just perform the desired join against the materialized result set.  Depending on how many rows are in the outer table of the join, this once-per-statement execution of the subquery can lead to major performance improvements over the once-per-outer-row execution employed by earlier versions of Derby.

As for the extra compilation time, this is due to the simple fact that the optimizer is now doing more work--i.e. in addition to considering nested loop joins with subqueries, it is now _also_ considering hash joins with those subqueries, and that means that it could potentially take longer for the optimizer to finish its work.  Note again that, if it occurs, the increased time should only occur at compilation time; execution time should either improve or, at the very least, remain the same as in earlier versions of Derby. 

SOLUTION 

This was an intentional change to improve the execution plans chosen by the optimizer for queries having large and/or complex subqueries.  The expectation is that the new behavior--and the subsequent query plans--will lead to improved performance over the old ones, so no further solution is required.

WORKAROUND

There is no way to disable/workaround this new behavior since the symptom as described above is a good one for Derby.

That said, any user who notices a negative performance change after moving to Derby 10.2, and who believes that the difference in performance is related to this optimizer enhancement, is encouraged to visit the following "performance diagnosis" page and to follow up with his/her findings on the Derby mailing lists:

	http://wiki.apache.org/db-derby/PerformanceDiagnosisTips

<end_release_note>

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.0.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B updated DERBY-781:
----------------------

    Derby Info: [Patch Available]

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement

>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram
>     Assignee: A B
>  Attachments: DERBY-781_v1.html, d781_v1.patch, d781_v1.stat
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (DERBY-781) Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B resolved DERBY-781.
-----------------------

    Fix Version/s: 10.2.1.0
       Resolution: Fixed
       Derby Info: [Existing Application Impact, Release Note Needed]  (was: [Release Note Needed, Existing Application Impact])

> Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.1.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>             Fix For: 10.2.1.0
>
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Resolved: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B resolved DERBY-781.
-----------------------

    Resolution: Fixed
    Derby Info: [Existing Application Impact, Release Note Needed]  (was: [Existing Application Impact, Patch Available, Release Note Needed])

Resolving issue as the patch was committed by Satheesh with svn #423989. I'll wait a couple of days to see if anything comes up and then will close this next week if all is well.

Thanks again to Bryan for the review and to Satheesh for the commit.

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.0.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Closed: (DERBY-781) Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B closed DERBY-781.
---------------------


> Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.1.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "Satheesh Bandaram (JIRA)" <de...@db.apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-781?page=comments#action_12361784 ] 

Satheesh Bandaram commented on DERBY-781:
-----------------------------------------

Thanks Jeff for your analysis. I was just getting ready to file another improvement request to make this optimization more generic. (not specific to unions) I have seen huge improvements in two different customer situations. For the situation I filed the defect, each of the views (V1 and V2) had 36 tables each and by materializing the inner view into a temp. table, I noticed speed up from 70-150 seconds to under 3 seconds. (including the cost of creating temp. table)

I also saw another situation later without unions where materializing some table subqueries improved performance by couple of orders of magnitude. So you are right... this optimization can be applied to other cases too.

I think materialization with or without hash joins should be useful. In both situations, creating temp. table that materialized derived tables improved so much.

   

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement
>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram

>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "Satheesh Bandaram (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

Satheesh Bandaram updated DERBY-781:
------------------------------------

    Description: 
Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.

For example:

create view V1 as select i, j from T1 union select i,j from T2;
create view V2 as select a,b from T3 union select a,b from T4;
insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);

For a query like select * from V1, V2 where i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.

Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.

public boolean performMaterialization(JBitSet outerTables)
		throws StandardException
{
	// RESOLVE - just say no to materialization right now - should be a cost based decision
	return false;

	/* Actual materialization, if appropriate, will be placed by our parent PRN.
	 * This is because PRN might have a join condition to apply.  (Materialization
	 * can only occur before that.
	 */
	//return true;
} 

  was:
Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.

For example:

create view V1 as select i, j from T1 union all select i,j from T2;
create view V2 as select a,b from T3 union all select a,b from T4;
insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);

For a query like select * from V1, V2 where i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.

Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.

public boolean performMaterialization(JBitSet outerTables)
		throws StandardException
{
	// RESOLVE - just say no to materialization right now - should be a cost based decision
	return false;

	/* Actual materialization, if appropriate, will be placed by our parent PRN.
	 * This is because PRN might have a join condition to apply.  (Materialization
	 * can only occur before that.
	 */
	//return true;
} 


> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement
>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram

>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Assigned: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B reassigned DERBY-781:
-------------------------

    Assign To: A B

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement

>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram
>     Assignee: A B

>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B updated DERBY-781:
----------------------

    Attachment: d781_v1.patch
                d781_v1.stat
                DERBY-781_v1.html

Attaching a patch (d781_v1.patch) to address this issue by allowing the optimizer to consider and choose hash joins with subqueries, which is a more general case of the specific union example mentioned in the description for this issue.  In brief, the patch does this by following up on the suggestions given by Jeff Lichtman in comments above and also in the following thread:

http://article.gmane.org/gmane.comp.apache.db.derby.devel/12208

Since result set materialization comes for "free" with hash joins, that fact we now allow hash joins with subqueries (as of this patch) means that we implicitly have a way to materialize the subquery result sets.

The details of the patch are included as DERBY-781_v1.html.  I added a simple test to lang/subquery.sql to demonstrate that the optimizer can and will choose to do hash joins for subqueries, and I updated one other master file--predicatesIntoViews--for which the optimizer is now choosing a hash join instead of a nested loop.  Testing of "unsafe" hash joins (see section VII of the document) and generation of correct plans is done through existing tests, esp. the lang/lojreorder.sql test, which was very useful in helping to verify the correctness of the changes.

Note that I did not add the sample union query shown in the description for this issue to the tests because when I run it against the current codeline, the optimizer will already choose to do materialization of the UnionNode (via hash join) even without the patch for this issue, and thus it didn't seem like that particular test case was useful.  The new test in subqery.sql is more relevant because the optimizer will choose to do a nested loop join with the subquery before my changes and will do a hash join after my changes, which seems to more accurately reflect what this issue is about.

I ran derbyall using sane jars on Red Hat Linux with ibm142 and saw no new failures, and the overall execution time does not change despite the extra work the optimizer is doing.

I would greatly appreciate any review/feedback people might have on these changes.  Thanks.

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement

>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram
>     Assignee: A B
>  Attachments: DERBY-781_v1.html, d781_v1.patch, d781_v1.stat
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "Satheesh Bandaram (JIRA)" <de...@db.apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-781?page=comments#action_12422753 ] 
            
Satheesh Bandaram commented on DERBY-781:
-----------------------------------------

I think it would be good to modify this improvement description, as it will likely be picked up by release notes and/or other documentation. The fix is more generic than 'UNION' subqueries as the original description says.

Also the example in the description doesn't apply anymore, I think.  When the entry was made, join-predicate push down work wasn't completed, so the example in the description would have shown the problem, I think. But now, (post join-predicate pushdown work) the example may not apply.



> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.0.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-781?page=comments#action_12421652 ] 
            
A B commented on DERBY-781:
---------------------------

Thank you so much for volunteering to do this review, Bryan--and for taking the time to read the write-up in its rather wordy entirety.  I really appreciate your time and effort here.

I'll work on putting together a release note as you described on derby-dev:

http://article.gmane.org/gmane.comp.apache.db.derby.devel/23875

and will post that to this issue and/or to DERBY-1357.

Thanks again for all of your time, Bryan!

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.0.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-781?page=comments#action_12422768 ] 
            
A B commented on DERBY-781:
---------------------------

Oh wait, check that last comment.  That was only for the summary; I see now that you were talking about the actual description.  Sorry.

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.0.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "Jeff Lichtman (JIRA)" <de...@db.apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-781?page=comments#action_12361709 ] 

Jeff Lichtman commented on DERBY-781:
-------------------------------------

I've been thinking about this enhancement request ever since it was reported. Something didn't seem quite right to me, but it wasn't until now that I was able to put my finger on it.

The report identifies a real problem (performance with a union as the inner table of a join) and proposes a solution that would work (materialization). I think, though, that the proposed solution focuses in the wrong place. The materialization should happen as a result of a join strategy, not as part of the logic associated with unions.

There are cases where materializing a union would cause a query to run slower. Materialization requires the creation of a temporary conglomerate and the inserting of rows into the conglomerate, so it should be done only if the savings are greater than the costs. Since materialization can make things either faster or slower depending on circumstances, the decision as to whether to materialize should be done in the optimizer.

Also, there are other types of  result sets that could benefit from materialization - for example, INTERSECT, joins, aggregates, etc. Any of these could end up on the right side of a join through the use of  table subqueries (i.e. SELECT statements in the FROM list of the outer query). I don't think we want to re-implement the materialization logic in all of these cases. I suppose the logic could be pushed into a parent class, but I think even that would be putting it in the wrong place.

If you think about it, we already have a join strategy that materializes the inner result  set, i.e. hash join.  I would expect the optimizer to at least consider this strategy for the example given in this enhancement request. We should check whether the optimizer is making the correct decision about hash join in this case before implementing materialization logic specific to unions.


> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement
>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram

>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Reopened: (DERBY-781) Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B reopened DERBY-781:
-----------------------

             
Reopening to set Fix-in version.

> Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.1.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>             Fix For: 10.2.1.0
>
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-781?page=comments#action_12422767 ] 
            
A B commented on DERBY-781:
---------------------------

>  I think it would be good to modify this improvement description

Seems like this could be a simple as removing the word "union" from the description--does that sound reasonable to you?  Or do you want an entirely new description?  Something like "Subquery materialization via hash join" or "Support subquery materialization by allowing the optimizer to cost and generate hash joins with subqueries".

Any preference?

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.0.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B updated DERBY-781:
----------------------

    Attachment: d781_v2.patch

Attaching an updated patch, d781_v2.patch, that is synced with the latest codeline and that also has a small fix to the lang/subquery test (there was a typo in the first patch).  Other than the minor test fix, this patch is identical to the _v1 patch.

Still awaiting review, if anyone has the time...

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement

>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram
>     Assignee: A B
>  Attachments: DERBY-781_v1.html, d781_v1.patch, d781_v1.stat, d781_v2.patch
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "Satheesh Bandaram (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

Satheesh Bandaram updated DERBY-781:
------------------------------------

    Comment: was deleted

> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement
>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram

>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (DERBY-781) Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B updated DERBY-781:
----------------------

       Summary: Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.  (was: Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.)
    Derby Info: [Existing Application Impact, Release Note Needed]  (was: [Release Note Needed, Existing Application Impact])

Removed the word "union" from the summary since the changes affect subqueries in general, not just UNION subqueries (as Satheesh pointed out).  I decided to leave the actual description as it is, though, since I think it's useful as background to the follow-up comments which eventually led to the final changes.  Also, if we change the description many of the comments will no longer make sense.  For the sake of clarity I prefer to leave the description as it is.  The more generic "summary" will be what's picked up in release notes, so I think that's good enough...

Of course, people should feel free to speak up if they disagree.

> Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.1.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "Satheesh Bandaram (JIRA)" <de...@db.apache.org>.
    [ http://issues.apache.org/jira/browse/DERBY-781?page=comments#action_12361785 ] 

Satheesh Bandaram commented on DERBY-781:
-----------------------------------------

Thanks Jeff for your analysis. I was just getting ready to file another improvement request to make this optimization more generic. (not specific to unions) I have seen huge improvements in two different customer situations. For the situation I filed the defect, each of the views (V1 and V2) had 36 tables each and by materializing the inner view into a temp. table, I noticed speed up from 70-150 seconds to under 3 seconds. (including the cost of creating temp. table) 

I also saw another situation later without unions where materializing some table subqueries improved performance by couple of orders of magnitude. So you are right... this optimization can be applied to other cases too. 

I think materialization with or without hash joins should be useful. In both situations, creating temp. table that materialized derived tables improved so much. You are right that the optimization should be done inside the optimizer.


> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement
>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram

>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Closed: (DERBY-781) Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "A B (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B closed DERBY-781.
---------------------


> Materialize subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.1.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>             Fix For: 10.2.1.0
>
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.

Posted by "Satheesh Bandaram (JIRA)" <de...@db.apache.org>.
     [ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

Satheesh Bandaram updated DERBY-781:
------------------------------------

    Description: 
Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.

For example:

create view V1 as select i, j from T1 union select i,j from T2;
create view V2 as select a,b from T3 union select a,b from T4;
insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);

For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.

Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.

public boolean performMaterialization(JBitSet outerTables)
		throws StandardException
{
	// RESOLVE - just say no to materialization right now - should be a cost based decision
	return false;

	/* Actual materialization, if appropriate, will be placed by our parent PRN.
	 * This is because PRN might have a join condition to apply.  (Materialization
	 * can only occur before that.
	 */
	//return true;
} 

  was:
Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.

For example:

create view V1 as select i, j from T1 union select i,j from T2;
create view V2 as select a,b from T3 union select a,b from T4;
insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);

For a query like select * from V1, V2 where i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.

Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.

public boolean performMaterialization(JBitSet outerTables)
		throws StandardException
{
	// RESOLVE - just say no to materialization right now - should be a cost based decision
	return false;

	/* Actual materialization, if appropriate, will be placed by our parent PRN.
	 * This is because PRN might have a join condition to apply.  (Materialization
	 * can only occur before that.
	 */
	//return true;
} 


> Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement
>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram

>
> Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times.
> Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
> 		throws StandardException
> {
> 	// RESOLVE - just say no to materialization right now - should be a cost based decision
> 	return false;
> 	/* Actual materialization, if appropriate, will be placed by our parent PRN.
> 	 * This is because PRN might have a join condition to apply.  (Materialization
> 	 * can only occur before that.
> 	 */
> 	//return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira