You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2013/10/30 18:23:26 UTC

[jira] [Updated] (HIVE-5697) Correlation Optimizer may generate wrong plans for cases involving outer join

     [ https://issues.apache.org/jira/browse/HIVE-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yin Huai updated HIVE-5697:
---------------------------

    Issue Type: Sub-task  (was: Bug)
        Parent: HIVE-3667

> Correlation Optimizer may generate wrong plans for cases involving outer join
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-5697
>                 URL: https://issues.apache.org/jira/browse/HIVE-5697
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> For example,
> {code:sql}
> select x.key, y.value, count(*) from src x right outer join src1 y on (x.key=y.key and x.value=y.value) group by x.key, y.value; 
> {code}
> Correlation optimizer will determine that a single MR job is enough for this query. However, the group by key are from both left and right tables of the right outer join. 
> We will have a wrong result like
> {code}
> NULL		4
> NULL	val_165	1
> NULL	val_193	1
> NULL	val_265	1
> NULL	val_27	1
> NULL	val_409	1
> NULL	val_484	1
> NULL		1
> 146	val_146	2
> 150	val_150	1
> 213	val_213	2
> NULL		1
> 238	val_238	2
> 255	val_255	2
> 273	val_273	3
> 278	val_278	2
> 311	val_311	3
> NULL		1
> 401	val_401	5
> 406	val_406	4
> 66	val_66	1
> 98	val_98	2
> {code}
> Rows with both x.key and y.value are null may not be grouped.



--
This message was sent by Atlassian JIRA
(v6.1#6144)