You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2009/01/30 03:08:59 UTC

[jira] Created: (HIVE-262) outer join gets some duplicate rows in some scenarios

outer join gets some duplicate rows in some scenarios
-----------------------------------------------------

                 Key: HIVE-262
                 URL: https://issues.apache.org/jira/browse/HIVE-262
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Namit Jain
            Assignee: Namit Jain


SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);


returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

    Status: Open  (was: Patch Available)

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

    Attachment: patch.262.1.txt

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669044#action_12669044 ] 

Namit Jain commented on HIVE-262:
---------------------------------

Does this happen to only 2 consecutive joins with the same set of the keys?  ----> YES

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668980#action_12668980 ] 

Zheng Shao commented on HIVE-262:
---------------------------------

Does this happen to only 2 consecutive joins with the same set of the keys?


> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

    Status: Open  (was: Patch Available)

need to add more tests

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

    Attachment: patch262.2.txt

forgot to update parse result files

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

    Status: Patch Available  (was: Open)

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670207#action_12670207 ] 

Ashish Thusoo commented on HIVE-262:
------------------------------------

+1

looks good to me.

I am running the tests right now and will commit once they pass.


> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-262.3.txt, patch.262.1.txt, patch.262.4.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

    Status: Open  (was: Patch Available)

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-262.3.txt, patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

    Status: Patch Available  (was: Open)

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-262.3.txt, patch.262.1.txt, patch.262.4.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670185#action_12670185 ] 

Namit Jain commented on HIVE-262:
---------------------------------

added explains to test

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-262.3.txt, patch.262.1.txt, patch.262.4.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670085#action_12670085 ] 

Ashish Thusoo commented on HIVE-262:
------------------------------------

can you add explain plans for the new test cases. Otherwise this looks good.

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-262.3.txt, patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

        Fix Version/s: 0.2.0
    Affects Version/s: 0.2.0
               Status: Patch Available  (was: Open)

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HIVE-262:
----------------------------------

    Priority: Blocker  (was: Major)

I marked this a s a Blocker, to indicate that this should definiotely go into the 0.20 branch.

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-262:
-------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

committed. Thanks Namit!!

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-262.3.txt, patch.262.1.txt, patch.262.4.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

    Attachment: patch.262.4.txt

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-262.3.txt, patch.262.1.txt, patch.262.4.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

    Status: Patch Available  (was: Open)

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-262.3.txt, patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669286#action_12669286 ] 

Namit Jain commented on HIVE-262:
---------------------------------

Ashish suggested the following approach:

Based on join conditions, create a set of all tables being joined (no outer join), and if one of them is null for a given value, all of them become null.

For example,

A join B on A.c1=B.c1 join C on A.c1=C.c1 right outer join D on A.c1=D.c1

A,B,C belong to the same group (since A joins with B and A joins with C).

So, for a given key (c1), if there is no row corresponding to either of A, B, or C - assume that
there is no row for all of them for that key.

That works for the example above, and the approach is different from the patch

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669288#action_12669288 ] 

Namit Jain commented on HIVE-262:
---------------------------------

However, the above does not work for the following:


A left outer join B on A.c1=B.c1 right outer join C on B.c1=C.c1

Consider the following rows for a given value of c1:

A --> a1 a2
B -> null
C -> c1 c2

Since there is no join, no pruning will happen, and the following output will be produced

null null c1
null null c1
null null c2
null null c2

whereas the correct output is:

null null c1
null null c2

Note that 2 extra rows will be produced.

So, I think the patch's approach should be better

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-262:
--------------------------------

        Fix Version/s: 0.3.0
                           (was: 0.6.0)
    Affects Version/s:     (was: 0.6.0)

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: patch-262.3.txt, patch.262.1.txt, patch.262.4.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-262:
----------------------------

    Attachment: patch-262.3.txt

> outer join gets some duplicate rows in some scenarios
> -----------------------------------------------------
>
>                 Key: HIVE-262
>                 URL: https://issues.apache.org/jira/browse/HIVE-262
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-262.3.txt, patch.262.1.txt, patch262.2.txt
>
>
> SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key < 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key < 20);
> returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.