You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2009/07/03 14:32:47 UTC

[jira] Commented: (PIG-871) Improve distribution of keys in reduce phase

    [ https://issues.apache.org/jira/browse/PIG-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726942#action_12726942 ] 

Tom White commented on PIG-871:
-------------------------------

MurmurHash (and other hashing schemes) can be found in the org.apache.hadoop.util.hash package of Hadoop Common.

> Improve distribution of keys in reduce phase
> --------------------------------------------
>
>                 Key: PIG-871
>                 URL: https://issues.apache.org/jira/browse/PIG-871
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Ankur
>
> The default hashing scheme used to distribute keys in reduce phase sometimes results in an uneven distribution of keys resulting in 5 - 10 % of reducers being overloaded with data. This bottleneck makes the PIG jobs really slow and gives users a bad impression.
> While there is no bullet proof solution to the problem in general, the hashing can certainly be improved for better distribution. The proposal here is to evaluate and incorporate other hashing schemes that give high avalanche and more even distribution. We can start by evaluating MurmurHash which is Apache 2.0 licensed and freely available here - http://www.getopt.org/murmur/MurmurHash.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.