You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2013/10/30 18:23:26 UTC
[jira] [Updated] (HIVE-5697) Correlation Optimizer may generate
wrong plans for cases involving outer join
[ https://issues.apache.org/jira/browse/HIVE-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated HIVE-5697:
---------------------------
Issue Type: Sub-task (was: Bug)
Parent: HIVE-3667
> Correlation Optimizer may generate wrong plans for cases involving outer join
> -----------------------------------------------------------------------------
>
> Key: HIVE-5697
> URL: https://issues.apache.org/jira/browse/HIVE-5697
> Project: Hive
> Issue Type: Sub-task
> Affects Versions: 0.12.0, 0.13.0
> Reporter: Yin Huai
> Assignee: Yin Huai
>
> For example,
> {code:sql}
> select x.key, y.value, count(*) from src x right outer join src1 y on (x.key=y.key and x.value=y.value) group by x.key, y.value;
> {code}
> Correlation optimizer will determine that a single MR job is enough for this query. However, the group by key are from both left and right tables of the right outer join.
> We will have a wrong result like
> {code}
> NULL 4
> NULL val_165 1
> NULL val_193 1
> NULL val_265 1
> NULL val_27 1
> NULL val_409 1
> NULL val_484 1
> NULL 1
> 146 val_146 2
> 150 val_150 1
> 213 val_213 2
> NULL 1
> 238 val_238 2
> 255 val_255 2
> 273 val_273 3
> 278 val_278 2
> 311 val_311 3
> NULL 1
> 401 val_401 5
> 406 val_406 4
> 66 val_66 1
> 98 val_98 2
> {code}
> Rows with both x.key and y.value are null may not be grouped.
--
This message was sent by Atlassian JIRA
(v6.1#6144)