You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2013/12/04 21:06:35 UTC
[jira] [Resolved] (MAHOUT-1242) No key redistribution function for
associative maps
[ https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suneel Marthi resolved MAHOUT-1242.
-----------------------------------
Resolution: Fixed
Patch committed to trunk.
> No key redistribution function for associative maps
> ---------------------------------------------------
>
> Key: MAHOUT-1242
> URL: https://issues.apache.org/jira/browse/MAHOUT-1242
> Project: Mahout
> Issue Type: Improvement
> Components: collections, Math
> Affects Versions: 0.7, 0.8
> Reporter: Dawid Weiss
> Assignee: Suneel Marthi
> Fix For: 0.9
>
> Attachments: MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just returns the key value:
> {code}
> /**
> * Returns a hashcode for the specified value.
> *
> * @return a hash code value for the specified value.
> */
> public static int hash(int value) {
> return value;
> //return value * 0x278DDE6D; // see org.apache.mahout.math.jet.random.engine.DRand
> /*
> value &= 0x7FFFFFFF; // make it >=0
> int hashCode = 0;
> do hashCode = 31*hashCode + value%10;
> while ((value /= 10) > 0);
> return 28629151*hashCode; // spread even further; h*31^5
> */
> }
> {code}
> This easily leads to very degenerate behavior on keys that have constant lower bits (long collision chains). A simple (and strong) hash function like the final step of murmurhash3 goes a long way at ensuring the keys distribution is more uniform regardless of the input distribution.
--
This message was sent by Atlassian JIRA
(v6.1#6144)