You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Dan Filimon (JIRA)" <ji...@apache.org> on 2013/06/13 17:09:22 UTC
[jira] [Created] (MAHOUT-1261) TasteHadoopUtils.idToIndex can
return an int that has size Integer.MAX_VALUE
Dan Filimon created MAHOUT-1261:
-----------------------------------
Summary: TasteHadoopUtils.idToIndex can return an int that has size Integer.MAX_VALUE
Key: MAHOUT-1261
URL: https://issues.apache.org/jira/browse/MAHOUT-1261
Project: Mahout
Issue Type: Bug
Components: Collaborative Filtering
Affects Versions: 0.8
Reporter: Dan Filimon
Assignee: Sean Owen
Priority: Minor
I'm running ItemSimilarityJob on a very large (~600M by 4B) matrix that's very sparse (total set of associations is 630MB).
The job fails because of an IndexException in ToUserVectorsReducer.
TasteHadoopUtils.idToIndex(long id) hashes a long with:
0x7fffffff & Longs.hashCode(id) (line o.a.m.cf.taste.hadoop.TasteHadoopUtils:57).
For some id (I don't know what value), the result returned is Integer.MAX_VALUE.
This cannot be set in the userVector because the cardinality of that is also Integer.MAX_VALUE and it throws an exception.
So, the issue is that values from 0 to INT_MAX are returned by idToIndex but the vector only has 0 to INT_MAX - 1 possible entries.
It's a nasty little off-by-one bug.
I'm thinking of just % size when setting.
[~ssc] & everyone else, thoughts? :)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira