You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2014/10/04 08:11:33 UTC

[jira] [Created] (HIVE-8349) DISTRIBUTE BY should work with tez auto-parallelism enabled

Gopal V created HIVE-8349:
-----------------------------

             Summary: DISTRIBUTE BY should work with tez auto-parallelism enabled
                 Key: HIVE-8349
                 URL: https://issues.apache.org/jira/browse/HIVE-8349
             Project: Hive
          Issue Type: Bug
            Reporter: Gopal V


Current implementation of DISTRIBUTE BY does not work when tez auto-parallelism is turned on, because of hashCode distribution issues.

In case of distribute by, the key is actually zero bytes, with only partitioning enabled via hashCode - this adversely affects the uniform hashing implementation.

In an ideal scenario, the edge should go from the ordered kv input to the unordered partitioned edge, to speed up the processing massively.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)