You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2010/08/12 08:05:16 UTC

[jira] Created: (HIVE-1534) predicate pushdown does not work correctly with outer joins

predicate pushdown does not work correctly with outer joins
-----------------------------------------------------------

                 Key: HIVE-1534
                 URL: https://issues.apache.org/jira/browse/HIVE-1534
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Amareshwari Sriramadasu


The hive documentation for predicate pushdown says:
Left outer join: predicates on the left side aliases are pushed
Right outer join: predicates on the right side aliases are pushed

But, this pushdown should not happen for AND predicates in join queries:
ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

    Status: Open  (was: Patch Available)

With the uploaded patch, there is bug with cartesian products and right outer joins, when there are filters. Looking into it.

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

    Status: Patch Available  (was: Open)

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910723#action_12910723 ] 

John Sichi commented on HIVE-1534:
----------------------------------

+1 to what Namit said about partitioned columns...although I have seen this mistake come up a lot, it's better to have consistent semantics and just educate users.


> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

    Attachment: patch-1534-4.txt

Added hive.outerjoin.supports.filter configuration with above semantics. Updated testcase to repeat the filter queries in join_filters.q with configuration turned off.

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534-4.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

        Summary: Join filters do not work correctly with outer joins  (was: Join filters does not work correctly with outer joins)
    Description: 
 SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
do not give correct results.

  was:
 SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
does not give correct results.


> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) predicate pushdown does not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906673#action_12906673 ] 

Amareshwari Sriramadasu commented on HIVE-1534:
-----------------------------------------------

bq. Definitely a bug. It happens regardless of the setting of hive.optimize.ppd, so it probably has something to do with the way the join condition is decomposed rather than the predicate pushdown optimization.
Yes. This has nothing to do with predicate pushdown. After going through Join implementation, I see that after Join-tree is generated, join filters are pushed and then join operator is generated for the join condition. But for outer joins, filters should be part of the join condition i.e. if the condition is not obeyed (or the filter is not obeyed), the values should be joined with nulls.

> predicate pushdown does not work correctly with outer joins
> -----------------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912741#action_12912741 ] 

Namit Jain commented on HIVE-1534:
----------------------------------

The patch looks good - however, we have a deployment issue.

This is a incompatible change, and will change/break existing queries. I cant think of a great way of getting this in.
One option is to cover it via a configurable parameter (it is ON by default). For internal deployments (like Facebook),
we can turn it off and find all the bad queries slowly and convert them, and only then enable this.


> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907094#action_12907094 ] 

Amareshwari Sriramadasu commented on HIVE-1534:
-----------------------------------------------

bq. I would suggest not supporting filters in the join condition for outer joins, since we are returning wrong results.
Created HIVE-1621 for this. Leaving this jira to do the correct fix.

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) predicate pushdown does not work correctly with outer joins

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897909#action_12897909 ] 

John Sichi commented on HIVE-1534:
----------------------------------

Definitely a bug.  It happens regardless of the setting of hive.optimize.ppd, so it probably has something to do with the way the join condition is decomposed rather than the predicate pushdown optimization.

> predicate pushdown does not work correctly with outer joins
> -----------------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1534:
-----------------------------

    Status: Open  (was: Patch Available)

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1534:
-----------------------------

          Status: Resolved  (was: Patch Available)
    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Committed. Thanks Amareshwari

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534-4.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

    Status: Patch Available  (was: Open)

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

    Status: Patch Available  (was: Open)

All the tests passed with uploaded patch.

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907001#action_12907001 ] 

Namit Jain commented on HIVE-1534:
----------------------------------

Definitely a bug, but not related to https://issues.apache.org/jira/browse/HIVE-1538.

For outer joins, the filters should not be pushed above the join.


For the query,

SELECT * FROM input3 a left outer JOIN input3 b ON (a.key=b.key AND a.key < 100);" 


the row: 100 100

is being pruned even before it reaches the join.

As you suggested above,  the correct solution is to have the filter as part of the join, which we dont support currently.

For now, I would suggest not supporting filters in the join condition for outer joins, since we are returning wrong results,
and the correct fix will involve a big change

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908858#action_12908858 ] 

Namit Jain commented on HIVE-1534:
----------------------------------

The approach looks OK - I will look into the code for more detailed comments.

One general comment was that you also need to account for progress if the join filters filter all the rows.
The task tracker may be thought of an un-responsive. Look at the filter operator, we send a progress in
that case if there are 'n' consecutive rows filtered out.

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910763#action_12910763 ] 

Namit Jain commented on HIVE-1534:
----------------------------------

You can cleanup the patch by not special-casing for partitioned columns. Otherwise, the patch looks good

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1534:
-----------------------------

    Status: Open  (was: Patch Available)

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

    Status: Patch Available  (was: Open)

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) predicate pushdown does not work correctly with outer joins

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897912#action_12897912 ] 

John Sichi commented on HIVE-1534:
----------------------------------

Assigning this to you in case you want to take a look at it together with HIVE-741.


> predicate pushdown does not work correctly with outer joins
> -----------------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters does not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

        Summary: Join filters does not work correctly with outer joins  (was: predicate pushdown does not work correctly with outer joins)
    Description: 
 SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
does not give correct results.

  was:
The hive documentation for predicate pushdown says:
Left outer join: predicates on the left side aliases are pushed
Right outer join: predicates on the right side aliases are pushed

But, this pushdown should not happen for AND predicates in join queries:
ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)



> Join filters does not work correctly with outer joins
> -----------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> does not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) predicate pushdown does not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898124#action_12898124 ] 

Amareshwari Sriramadasu commented on HIVE-1534:
-----------------------------------------------

With ppd on or off, Mapper is filtering out table with alias a on the predicate a.key < 100 for the left outer join query. Similarly on alias b for right outer join query. This is mostly because of HIVE-1538.

> predicate pushdown does not work correctly with outer joins
> -----------------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

    Attachment: patch-1534.txt

I worked on separating filters for outer joins. 
Here is a patch which does not push filters for outer joins and filters are applied in the join implementation.

Patch would need some more work on SMBMapJoin. Please have look if the approach looks fine.

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

    Attachment: patch-1534-2.txt

bq. What about semi joins ? 
I did not find anything wrong with semijoin queries and filters. They can be pushed as they are now. I don't think any change is required for semi joins. What do you think?

Uploading patch with following changes from earlier one:
* I think it makes sense to push the filters on partitioned columns and not output all the table for outer join. Patch pushes filters on partitioned columns, even for outer joins. Thoughts?
* Removed duplicate code in genExprNode() in SemanticAnalyzer.
* Fixed some minor bugs in SemanticAnalyzer and CommonJoinOperator, found in the test failures.
* Updated the test results for clientpositive queries in join20.q.out, join21.q.out and join40.q.out, which involve filters on outer joins. Also, updated test results for TestParse join queries.

All the tests passed with the uploaded patch.


> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12913105#action_12913105 ] 

Namit Jain commented on HIVE-1534:
----------------------------------

+1

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534-4.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909604#action_12909604 ] 

Amareshwari Sriramadasu commented on HIVE-1534:
-----------------------------------------------

bq. Did you run all the tests ? Some of the tests should break - minimally a change of explain plans.
I commented Explain annotation for filter predicates to skip test failures with explain plans for now; trying to figure out if filter predicates can be displayed only when they are present. Even then, all the tests did not pass. I'm looking into the failures.

bq. What about semi joins ? 
I did not do anything for semi join. I will go through semi join and its expected output with filters. Will update the patch if any changes are needed and add tests.

bq. Why did you add a genExprNode() ? Cant you re-use the one from SemanticAnalyzer ?
genExprNode() in SemanticAnalyzer is not a static method. It uses unparseTranslator while generating. Will change genExprNode() in SemanticAnalyzer to use the new static method introduced and remove duplicate code.

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912590#action_12912590 ] 

Namit Jain commented on HIVE-1534:
----------------------------------

I will take a look again

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912845#action_12912845 ] 

Namit Jain commented on HIVE-1534:
----------------------------------

What I meant to say was the following:


People are running queries in the warehouse with the expected wrong semantics - if we suddenly fix this, the queries will break.
We need to give some time to everyone to change their queries to use a sub-query if they want the filter to be pushed up.

Adding the above config, parameter seems like the only choice - we can try to remove this parameter before 0.7 goes out 
(if everyone agrees), but we need it right now for deployment

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912831#action_12912831 ] 

Amareshwari Sriramadasu commented on HIVE-1534:
-----------------------------------------------

bq. This is a incompatible change, and will change/break existing queries.
The patch changes the query results to be different from what it was earlier. It does not give out any error, unlike HIVE-1621.  The effect is similar to HIVE-741.  I don't think this is incompatible change, because the current query results are wrong and the issue fixes the correctness.

If you still think it is incompatible change, we can add a configuration like hive.outerjoin.supports.filters with following semantics:
# when it is off,  print a warning saying "the results could be wrong. please turn on  hive.outerjoin.supports.filters" and add it to filtersForPushing (same as old behavior).
# When it is on, the filters are added to join filters.


> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) predicate pushdown does not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897587#action_12897587 ] 

Amareshwari Sriramadasu commented on HIVE-1534:
-----------------------------------------------

For a table input3 with following data:
||key || value ||
|NULL |   35 |
|12 |     NULL|
|10 |     1000 |
|10 |     100|
|100|    100 |

The queries
{code}
 SELECT * FROM input3 a left outer JOIN input3 b ON (a.key=b.key AND a.key < 100);
and
 SELECT * FROM input3 a right outer JOIN input3 b ON (a.key=b.key AND b.key < 100);
{code} 
produce the output as 
{noformat}
10      1000    10      100
10      1000    10      1000
10      100     10      100
10      100     10      1000
12      NULL    12      NULL
{noformat}

Where as the expected output for 
"SELECT * FROM input3 a left outer JOIN input3 b ON (a.key=b.key AND a.key < 100);" is
| NULL |   35 | NULL | NULL |
|   10 | 1000 |   10 | 1000 |
|   10 | 1000 |   10 |  100 |
|  100 |  100 | NULL | NULL |
|   12 | NULL |   12 | NULL |
|   10 |  100 |   10 | 1000 |
|   10 |  100 |   10 |  100 |

 and expected output for "SELECT * FROM input3 a right outer JOIN input3 b ON (a.key=b.key AND b.key < 100);" is
| NULL | NULL | NULL |   35 |
|   10 | 1000 |   10 | 1000 |
|   10 |  100 |   10 | 1000 |
| NULL | NULL |  100 |  100 |
|   12 | NULL |   12 | NULL |
|   10 | 1000 |   10 |  100 |
|   10 |  100 |   10 |  100 |


> predicate pushdown does not work correctly with outer joins
> -----------------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

    Attachment: patch-1534-3.txt

Patch updated with following changes:
* removed the special casing for partitioned columns from SemanticAnalyzer
* Updated ColumnPrunerProcFactory.pruneJoinOperator() to add columns from join-filters.
* Updated test results for  queries: union22.q, louter_join_ppr.q, outer_join_ppr.q and router_join_ppr.q, which involve filters on partition predicates.


> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534-3.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909182#action_12909182 ] 

Amareshwari Sriramadasu commented on HIVE-1534:
-----------------------------------------------

Thanks Namit for the feedback.

bq. One general comment was that you also need to account for progress if the join filters filter all the rows.
This shouldn't be a problem because, even if the row is filtered, the join will emit output by joining it with nulls. Am I sounding right?

I'm almost done with the next patch which fixes the bugs with right outer join, full outer join and nested joins in the earlier patch. Will upload it once all the tests pass.

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909539#action_12909539 ] 

Namit Jain commented on HIVE-1534:
----------------------------------

Did you run all the tests ? Some of the tests should break - minimally a change of explain plans.
What about semi joins ?

Why did you add a genExprNode() ? Cant you re-use the one from SemanticAnalyzer ?

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909388#action_12909388 ] 

Namit Jain commented on HIVE-1534:
----------------------------------

Since we are still pushing filters for non-outer joins, the assumption that we will always output a row by the filters is true, and therefore 
we dont need a progress.

Cool, I will take a look at the patch again

> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910669#action_12910669 ] 

Namit Jain commented on HIVE-1534:
----------------------------------

bq. I think it makes sense to push the filters on partitioned columns and not output all the table for outer join. Patch pushes filters on partitioned columns, even for outer joins. Thoughts?

I dont think it is a good idea to special case partitioned columns - can you treat them like any other column



> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534-2.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1534) predicate pushdown does not work correctly with outer joins

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi reassigned HIVE-1534:
--------------------------------

    Assignee: Amareshwari Sriramadasu

> predicate pushdown does not work correctly with outer joins
> -----------------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> The hive documentation for predicate pushdown says:
> Left outer join: predicates on the left side aliases are pushed
> Right outer join: predicates on the right side aliases are pushed
> But, this pushdown should not happen for AND predicates in join queries:
> ex: SELECT * FROM T1 JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1534) Join filters do not work correctly with outer joins

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amareshwari Sriramadasu updated HIVE-1534:
------------------------------------------

    Attachment: patch-1534-1.txt

Patch with bug fixes. Also takes care of SMBMapJoin.

Patch does the following:
* Moved the code which decides whether to push the filters or not from pushJoinFilters() to creation of filters (parseJoinCondition). Because, for nested joins, mergeJoinTree changes the join-tree structure. 
** Join filters are not pushed if
*** join is full outer join or
*** join is left outer join and filter is on left alias
*** join is right outer join and filter is on right alias
* Join impl changes:
** When join values are computed from the row object, the row is evaluated to be filtered or not.
** Depending on the row is filtered or not, joinObjects*OuterJoin methods put null arrays
* Fixed a minor bug in JoinDesc with noOuterJoin variable. The variable is not passed down properly and isNoOuterJoin looks for the join type and evaluates. Fixed it so that noOuterJoin variable contains correct value




> Join filters do not work correctly with outer joins
> ---------------------------------------------------
>
>                 Key: HIVE-1534
>                 URL: https://issues.apache.org/jira/browse/HIVE-1534
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1534-1.txt, patch-1534.txt
>
>
>  SELECT * FROM T1 LEFT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T1.c1 < 10)
> and  SELECT * FROM T1 RIGHT OUTER JOIN T2 ON (T1.c1=T2.c2 AND T2.c1 < 10)
> do not give correct results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.