You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2013/10/30 18:21:31 UTC
[jira] [Created] (HIVE-5697) Correlation Optimizer may generate
wrong plans for cases involving outer join
Yin Huai created HIVE-5697:
------------------------------
Summary: Correlation Optimizer may generate wrong plans for cases involving outer join
Key: HIVE-5697
URL: https://issues.apache.org/jira/browse/HIVE-5697
Project: Hive
Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0
Reporter: Yin Huai
Assignee: Yin Huai
For example,
{code:sql}
select x.key, y.value, count(*) from src x right outer join src1 y on (x.key=y.key and x.value=y.value) group by x.key, y.value;
{code}
Correlation optimizer will determine that a single MR job is enough for this query. However, the group by key are from both left and right tables of the right outer join.
We will have a wrong result like
{code}
NULL 4
NULL val_165 1
NULL val_193 1
NULL val_265 1
NULL val_27 1
NULL val_409 1
NULL val_484 1
NULL 1
146 val_146 2
150 val_150 1
213 val_213 2
NULL 1
238 val_238 2
255 val_255 2
273 val_273 3
278 val_278 2
311 val_311 3
NULL 1
401 val_401 5
406 val_406 4
66 val_66 1
98 val_98 2
{code}
Rows with both x.key and y.value are null may not be grouped.
--
This message was sent by Atlassian JIRA
(v6.1#6144)