You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by 冯伟 <wh...@gmail.com> on 2012/05/08 10:13:38 UTC

How to index by long ID in RandomAccessSparseVector

I have read some code about item-based recommendation in version-0.6,
starting from "org.apache.mahout.cf.taste.
hadoop.item.RecommenderJob". I found that there is a Long-to-Int mapping
provided by the function "int TasteHadoopUtils.idToIndex(long)".
Long-to-Int is performed both on userId and itemId. I wonder if it possible
to have two long mapped into one int? If it is the case, then we would
likely to merge vectors from different itemids/uids, right? This is quite
confusing.

Is it better to provide a RandomAccessSparseVector implemented by
OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance.

----------------------
Wei Feng

Re: How to index by long ID in RandomAccessSparseVector

Posted by Sean Owen <sr...@gmail.com>.

That's right. It ought to be uncommon but can happen. For recommenders, it
"only" means that you start to treat two users or two items as the same
thing. That doesn't do much harm though. Maybe one user's recs are a little
funny.

I do think it would have been useful to index by long, but that would have
significantly increased memory requirements too.

(In developing Myrrix I have switched to use a data structure indexed by
long though, because it becomes more necessary to avoid the mapping.)

On Tue, May 8, 2012 at 9:13 AM, 冯伟 <wh...@gmail.com> wrote:

> I have read some code about item-based recommendation in version-0.6,
> starting from "org.apache.mahout.cf.taste.
> hadoop.item.RecommenderJob". I found that there is a Long-to-Int mapping
> provided by the function "int TasteHadoopUtils.idToIndex(long)".
> Long-to-Int is performed both on userId and itemId. I wonder if it possible
> to have two long mapped into one int? If it is the case, then we would
> likely to merge vectors from different itemids/uids, right? This is quite
> confusing.
>
> Is it better to provide a RandomAccessSparseVector implemented by
> OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance.
>
> ----------------------
> Wei Feng
>