You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2014/05/16 13:15:51 UTC
[jira] [Created] (HIVE-7074) The reducer parallelism should be a
prime number for better stride protection
Gopal V created HIVE-7074:
-----------------------------
Summary: The reducer parallelism should be a prime number for better stride protection
Key: HIVE-7074
URL: https://issues.apache.org/jira/browse/HIVE-7074
Project: Hive
Issue Type: Improvement
Components: Statistics
Reporter: Gopal V
Assignee: Gopal V
Attachments: HIVE-7074.1.patch
The current hive reducer parallelism results in stride issues with key distribution.
a JOIN generating even numbers will get strided onto only some of the reducers.
The probability of distribution skew is controlled by the number of common factors shared by the hashcode of the key and the number of buckets.
Using a prime number within the reducer estimation will cut that probability down by a significant amount.
--
This message was sent by Atlassian JIRA
(v6.2#6252)