You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/05/28 17:17:00 UTC

[jira] [Work logged] (HIVE-23435) Full outer join result is missing rows

     [ https://issues.apache.org/jira/browse/HIVE-23435?focusedWorklogId=438358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438358 ]

ASF GitHub Bot logged work on HIVE-23435:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/May/20 17:16
            Start Date: 28/May/20 17:16
    Worklog Time Spent: 10m 
      Work Description: mustafaiman opened a new pull request #1039:
URL: https://github.com/apache/hive/pull/1039


   Full outer join keeps filtered array for all items seen for each table as it reads rows. This is later used to determine if a right table row has at least one matching row from left table. If not, then "NULL-rightrow" is emitted. The same goes for "leftrow-NULL" output too. These filtered arrays are per group in reducer. However, rows are read until the first item of the next group is read. This leads to mixing filtered information of the next group to current group causing wrong results for full outer join.
   
   Change-Id: I916bfeb354a0d3d5c2eca151887adab3934569f6


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 438358)
    Remaining Estimate: 0h
            Time Spent: 10m

> Full outer join result is missing rows 
> ---------------------------------------
>
>                 Key: HIVE-23435
>                 URL: https://issues.apache.org/jira/browse/HIVE-23435
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.1.0
>            Reporter: Naveen Gangam
>            Assignee: Mustafa Iman
>            Priority: Major
>         Attachments: HIVE-23435.patch, HIVE-23435.patch, HIVE-23435.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Full Outer join result has missing rows. Appears to be a bug with the full outer join logic. Expected output is receiving when we do a left and right outer join.
> Reproducible steps are mentioned below.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> SUPPORT ANALYSIS
> Steps to Reproduce:
> 1. Create a table and insert data:
> create table x (z char(5), x int, y int);
> insert into x values ('one', 1, 50),
>  ('two', 2, 30),
>  ('three', 3, 30),
>  ('four', 4, 60),
>  ('five', 5, 70),
>  ('six', 6, 80);
> 2. Try full outer with the below command. The result is incomplete, it is missing the row:
> NULL NULL NULL three 3 30.0
>  Full Outer Join:
> select x1.`z`, x1.`x`, x1.`y`, x2.`z`,
>  x2.`x`, x2.`y`
>  from `x` x1 full outer join
>  `x` x2 on (x1.`x` > 3) and (x2.`x` < 4) and (x1.`x` =
>  x2.`x`);
> Result:
> ----------------------------------+
> x1.z x1.x x1.y x2.z x2.x x2.y
>  ----------------------------------+
> one 1 50 NULL NULL NULL
>  NULL NULL NULL one 1 50
>  two 2 30 NULL NULL NULL
>  NULL NULL NULL two 2 30
>  three 3 30 NULL NULL NULL
>  four 4 60 NULL NULL NULL
>  NULL NULL NULL four 4 60
>  five 5 70 NULL NULL NULL
>  NULL NULL NULL five 5 70
>  six 6 80 NULL NULL NULL
>  NULL NULL NULL six 6 80
>  ----------------------------------+
> 3. Expected output is coming when we use left/right join + union:
> select x1.`z`, x1.`x`, x1.`y`, x2.`z`,
>  x2.`x`, x2.`y`
>  from `x` x1 left outer join
>  `x` x2 on (x1.`x` > 3) and (x2.`x` < 4) and (x1.`x` =
>  x2.`x`)
>  union
>  select x1.`z`, x1.`x`, x1.`y`, x2.`z`,
>  x2.`x`, x2.`y`
>  from `x` x1 right outer join
>  `x` x2 on (x1.`x` > 3) and (x2.`x` < 4) and (x1.`x` =
>  x2.`x`);
> Result:
> ------------------------------------+
> z x y _col3 _col4 _col5
>  ------------------------------------+
> NULL NULL NULL five 5 70
>  NULL NULL NULL four 4 60
>  NULL NULL NULL one 1 50
>  four 4 60 NULL NULL NULL
>  one 1 50 NULL NULL NULL
>  six 6 80 NULL NULL NULL
>  three 3 30 NULL NULL NULL
>  two 2 30 NULL NULL NULL
>  NULL NULL NULL six 6 80
>  NULL NULL NULL three 3 30
>  NULL NULL NULL two 2 30
>  five 5 70 NULL NULL NULL
>  ------------------------------------+
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)