You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2009/10/12 13:37:31 UTC

[jira] Created: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Transformation to inner join not performed for certain three-way joins
----------------------------------------------------------------------

                 Key: DERBY-4405
                 URL: https://issues.apache.org/jira/browse/DERBY-4405
             Project: Derby
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 10.6.0.0
            Reporter: Knut Anders Hatlen
            Priority: Minor


In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.

The queries use the tours db and look like this:

(1)
SELECT * FROM CITIES LEFT OUTER JOIN
    (FLIGHTS CROSS JOIN COUNTRIES)
        ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
            WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'

(2)
SELECT * FROM CITIES LEFT OUTER JOIN
    FLIGHTS INNER JOIN COUNTRIES ON 1=1
        ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
            WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'

(3)
SELECT * FROM CITIES LEFT OUTER JOIN
    (SELECT * FROM FLIGHTS, COUNTRIES) S
        ON CITIES.AIRPORT = S.ORIG_AIRPORT
            WHERE S.COUNTRY_ISO_CODE = 'US'

When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.

The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.

It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.

This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik resolved DERBY-4405.
----------------------------------

          Resolution: Fixed
       Fix Version/s: 10.6.0.0
    Issue & fix info:   (was: [Patch Available])

> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>            Priority: Minor
>             Fix For: 10.6.0.0
>
>         Attachments: derby-4405-2.diff, derby-4405-2.stat, derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik reassigned DERBY-4405:
------------------------------------

    Assignee: Dag H. Wanvik

> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>            Priority: Minor
>         Attachments: derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783457#action_12783457 ] 

Dag H. Wanvik commented on DERBY-4405:
--------------------------------------

Thanks for looking at the patch, Bryan. I introduced the boolean so as not the change the behavior of the
call to fillInReferencedTableMap from FromBaseTable#LOJgetReferencedTables, since that call was not involved
in solving this issue. I will investigate some more of that usage could benefit from the closure as well. LOJgetReferencedTables is used by the LOJ linearization logic, and I didn't study it yet.

> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Priority: Minor
>         Attachments: derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-4405:
---------------------------------

    Issue & fix info: [Patch Available]

> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Priority: Minor
>         Attachments: derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-4405:
---------------------------------

    Attachment: derby-4405-2.stat
                derby-4405-2.diff

Uploading a simplified patch along the lines I suggested, plus added
test cases for right outer join too, and made more explicit tests that outer join is gone after rewrite (RuntimeStatisticsParser#usedNLLeftOuterJoin).
Rerunning regressions.

> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>            Priority: Minor
>         Attachments: derby-4405-2.diff, derby-4405-2.stat, derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-4405:
---------------------------------

    Attachment: derby-4405.stat
                derby-4405.diff

Uploading a patch which lets the two variant outer join statements be rewritten to
inner join, by looking not just at the top level inner table for match when analyzing
the null intolerant predicates, but checking against tables in nested inner join (inner join in the inner position of an outer join) as well.
Added a repro test case to JoinNode which checks for the presence of a hash join to
indicate the rewrite did happen (not used before the rewrite).

Running regressions.


> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Priority: Minor
>         Attachments: derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783421#action_12783421 ] 

Bryan Pendleton commented on DERBY-4405:
----------------------------------------

Hi Dag, you should assign this issue to yourself. Thanks for investigating this!

Why did you need to add a parameter to the fillInReferencedTableMap() method?
Would it be possible to just make the new behavior be unconditional, and have
JoinNode always fill in the table map positions from its children?


> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Priority: Minor
>         Attachments: derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789632#action_12789632 ] 

Dag H. Wanvik commented on DERBY-4405:
--------------------------------------

Looking at the reordering done by Derby in
HalfOuterJoinNode#LOJ_reorderable, it seems to be buggy; filed
DERBY-4471 for that.

Instead of modifying fillInReferencedTableMap when used from
transformOuterJoins, I see that I possibly could use the method
ResultSetNode#LOJgetReferencedTables instead, which does look inside
join (inner and outer join) nodes in the way we want. I will make a
simplified patch which uses that method.

Btw, the simplication of outer join to inner join is defined by
identity #3 in [1] referred to above. 


> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>            Priority: Minor
>         Attachments: derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783474#action_12783474 ] 

Dag H. Wanvik commented on DERBY-4405:
--------------------------------------

Regressions passed.


> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>            Priority: Minor
>         Attachments: derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790975#action_12790975 ] 

Dag H. Wanvik commented on DERBY-4405:
--------------------------------------

Regressions passed, committed as svn 891015.

> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>            Priority: Minor
>         Attachments: derby-4405-2.diff, derby-4405-2.stat, derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen closed DERBY-4405.
-------------------------------------


Verified that the three statements had similar performance on trunk. Closing.

> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>            Priority: Minor
>             Fix For: 10.6.0.0
>
>         Attachments: derby-4405-2.diff, derby-4405-2.stat, derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-4405) Transformation to inner join not performed for certain three-way joins

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786263#action_12786263 ] 

Dag H. Wanvik commented on DERBY-4405:
--------------------------------------

Trying to understand the code for the outer join reordering a.k.a. linearization that Derby performs, cf.
HalfOuterJoinNode#LOJ_reorderable.

Is anyone familiar with what the rationale/intuition for this transformation is? I did not find it
described in the Tuning guide (http://db.apache.org/derby/docs/10.5/tuning/ctuntransform55045.html).
It does mention the simplification to inner joins, though.

This fairly recent paper which does address it [1] in much detail, but Derby's logic is fairly limited..

[1] Galindo-Legaria, C. & Rosenthal, A.: "Outerjoin simplification and reordering for query optimization", ACM Transactions on Database Systems, Vol 22, No 1, March 1997.


> Transformation to inner join not performed for certain three-way joins
> ----------------------------------------------------------------------
>
>                 Key: DERBY-4405
>                 URL: https://issues.apache.org/jira/browse/DERBY-4405
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.6.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>            Priority: Minor
>         Attachments: derby-4405.diff, derby-4405.stat
>
>
> In the CROSS JOIN section in the reference manual (http://db.apache.org/derby/docs/dev/ref/rrefsqljcrossjoin.html) there are three examples that are supposed to be equivalent. However, the performance differs significantly between the different queries.
> The queries use the tours db and look like this:
> (1)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (FLIGHTS CROSS JOIN COUNTRIES)
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (2)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     FLIGHTS INNER JOIN COUNTRIES ON 1=1
>         ON CITIES.AIRPORT = FLIGHTS.ORIG_AIRPORT
>             WHERE COUNTRIES.COUNTRY_ISO_CODE = 'US'
> (3)
> SELECT * FROM CITIES LEFT OUTER JOIN
>     (SELECT * FROM FLIGHTS, COUNTRIES) S
>         ON CITIES.AIRPORT = S.ORIG_AIRPORT
>             WHERE S.COUNTRY_ISO_CODE = 'US'
> When executed in ij, (1) and (2) need 6 seconds to complete, whereas (3) completes in 50 ms.
> The query plans for (1) and (2) use nested loop joins and table scans. (3) uses a combination of hash join and nested loop join, and index scans as well as table scans.
> It looks like (3) has been rewritten from a left outer join to an inner join internally. This is fine because all rows that have the right-side columns filled with NULLs will be filtered out by the predicate S.COUNTRY_ISO_CODE='US', so the extra rows generated by the outer join will not be returned.
> This optimization should also be possible for (1) and (2). We should improve the logic so that those joins are transformed too. The transformation happens in HalfOuterJoinNode.transformOuterJoins().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.