You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2017/07/05 09:20:00 UTC
[jira] [Created] (HIVE-17037) Extend join algorithm selection to
avoid unnecessary input data shuffle
Jesus Camacho Rodriguez created HIVE-17037:
----------------------------------------------
Summary: Extend join algorithm selection to avoid unnecessary input data shuffle
Key: HIVE-17037
URL: https://issues.apache.org/jira/browse/HIVE-17037
Project: Hive
Issue Type: Improvement
Components: Physical Optimizer
Affects Versions: 3.0.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
As an example, consider the following query:
{code:sql}
SELECT *
FROM (
SELECT a.value
FROM src1 a
JOIN src1 b
ON (a.value = b.value)
GROUP BY a.value
) a
JOIN src
ON (a.value = src.value);
{code}
Currently, the plan generated for Tez will contain an unnecessary shuffle operation between the subquery and the join, since the records produced by the subquery are already sorted by the value.
This issue is to extend join algorithm selection to be able to shuffle only some of the inputs for a given join and avoid unnecessary shuffle operations.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)