You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2014/12/17 18:11:13 UTC
[jira] [Commented] (HIVE-9146) Query with left joins produces wrong
result when join condition is written in different order
[ https://issues.apache.org/jira/browse/HIVE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250136#comment-14250136 ]
Ashutosh Chauhan commented on HIVE-9146:
----------------------------------------
you might be hitting into HIVE-8298 can you test your queries on Hive 0.14 and post your findings here.
> Query with left joins produces wrong result when join condition is written in different order
> ---------------------------------------------------------------------------------------------
>
> Key: HIVE-9146
> URL: https://issues.apache.org/jira/browse/HIVE-9146
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.13.1
> Reporter: Kamil Gorlo
>
> I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug.
> I have two simple tables:
> desc kgorlo_comm;
> | col_name | data_type | comment |
> | id | bigint | |
> | dest_id | bigint | |
> desc kgorlo_log;
> | col_name | data_type | comment |
> | id | bigint | |
> | dest_id | bigint | |
> | tstamp | bigint | |
> With data:
> select * from kgorlo_comm;
> | kgorlo_comm.id | kgorlo_comm.dest_id |
> | 1 | 2 |
> | 2 | 1 |
> | 1 | 3 |
> | 2 | 3 |
> | 3 | 5 |
> | 4 | 5 |
> select * from kgorlo_log;
> | kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp |
> | 1 | 2 | 0 |
> | 1 | 3 | 0 |
> | 1 | 5 | 0 |
> | 3 | 1 | 0 |
> And when I run this query (query no. 1):
> {quote}
> select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
> left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
> left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
> {quote}
> I get result (which is correct):
> | log.id | log.dest_id | com1.msgs | com2.msgs |
> | 1 | 2 | 1 | 1 |
> | 1 | 3 | 1 | NULL |
> | 1 | 5 | NULL | NULL |
> | 3 | 1 | NULL | 1 |
> But when I run second query (query no. 2):
> {quote}
> select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
> left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
> left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
> {quote}
> I get different (and bad, in my opinion) result:
> |log.id | log.dest_id | com1.msgs | com2.msgs|
> |1|2|1|1|
> |1|3|1|1|
> |1|5|NULL|NULL|
> |3|1|NULL|NULL|
> Query no. 1 and query no. 2 are different in only one place, it is second join condition:
> bf. com2.dest_id=log.id and com2.id=log.dest_id
> vs
> bf. com2.id=log.dest_id and com2.dest_id=log.id
> which in my opinion are equal.
> Explains for both queries are of course slightly different (columns are swapped) and they are here:
> https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
> https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)