You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2017/07/05 09:20:00 UTC

[jira] [Created] (HIVE-17037) Extend join algorithm selection to avoid unnecessary input data shuffle

Jesus Camacho Rodriguez created HIVE-17037:
----------------------------------------------

             Summary: Extend join algorithm selection to avoid unnecessary input data shuffle
                 Key: HIVE-17037
                 URL: https://issues.apache.org/jira/browse/HIVE-17037
             Project: Hive
          Issue Type: Improvement
          Components: Physical Optimizer
    Affects Versions: 3.0.0
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez


As an example, consider the following query:

{code:sql}
SELECT *
FROM (
  SELECT a.value
  FROM src1 a
  JOIN src1 b
  ON (a.value = b.value)
  GROUP BY a.value
) a
JOIN src
ON (a.value = src.value);
{code}

Currently, the plan generated for Tez will contain an unnecessary shuffle operation between the subquery and the join, since the records produced by the subquery are already sorted by the value.

This issue is to extend join algorithm selection to be able to shuffle only some of the inputs for a given join and avoid unnecessary shuffle operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)