You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2013/10/30 18:21:31 UTC

[jira] [Created] (HIVE-5697) Correlation Optimizer may generate wrong plans for cases involving outer join

Yin Huai created HIVE-5697:
------------------------------

             Summary: Correlation Optimizer may generate wrong plans for cases involving outer join
                 Key: HIVE-5697
                 URL: https://issues.apache.org/jira/browse/HIVE-5697
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.12.0, 0.13.0
            Reporter: Yin Huai
            Assignee: Yin Huai


For example,
{code:sql}
select x.key, y.value, count(*) from src x right outer join src1 y on (x.key=y.key and x.value=y.value) group by x.key, y.value; 
{code}
Correlation optimizer will determine that a single MR job is enough for this query. However, the group by key are from both left and right tables of the right outer join. 

We will have a wrong result like
{code}
NULL		4
NULL	val_165	1
NULL	val_193	1
NULL	val_265	1
NULL	val_27	1
NULL	val_409	1
NULL	val_484	1
NULL		1
146	val_146	2
150	val_150	1
213	val_213	2
NULL		1
238	val_238	2
255	val_255	2
273	val_273	3
278	val_278	2
311	val_311	3
NULL		1
401	val_401	5
406	val_406	4
66	val_66	1
98	val_98	2
{code}
Rows with both x.key and y.value are null may not be grouped.



--
This message was sent by Atlassian JIRA
(v6.1#6144)