You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2011/06/12 04:47:00 UTC

RecommenderJob uses indirection for ItemIDs

The RecommenderJob makes a "side" file which maps a fabricated integer
index to a long ItemID. Why is this needed? Couldn't the
RecommenderJob propagate the long ItemID directly? Note that this
forces all instances of AggregateAndReduceRecommender to load the
entire map. Part of the Map/Reduce rules are 'nothing needs to know
everything'.

Is this a sparse/dense optimization? If so, have the distributed
algorithms advanced enough to make this indirection unnecessary?

-- 
Lance Norskog
goksron@gmail.com

Re: RecommenderJob uses indirection for ItemIDs

Posted by Sean Owen <sr...@gmail.com>.

No all vectors here use int to express dimension. It is nothing to do with
sparseness.
On Jun 13, 2011 12:26 AM, "Lance Norskog" <go...@gmail.com> wrote:
> Ah! So if it was a sparse vector it could be indexed directly. Or the
> mapping could be with a hash-indexed representation as used with
> Lucene vectors.
>
> On Sun, Jun 12, 2011 at 3:43 AM, Sean Owen <sr...@gmail.com> wrote:
>> The keys have to be hashed to be used as int offsets into a vector. While
>> loading the mapping isn't ideal it does only scale as the number of items
>> and users.
>>  On Jun 12, 2011 3:47 AM, "Lance Norskog" <go...@gmail.com> wrote:
>>> The RecommenderJob makes a "side" file which maps a fabricated integer
>>> index to a long ItemID. Why is this needed? Couldn't the
>>> RecommenderJob propagate the long ItemID directly? Note that this
>>> forces all instances of AggregateAndReduceRecommender to load the
>>> entire map. Part of the Map/Reduce rules are 'nothing needs to know
>>> everything'.
>>>
>>> Is this a sparse/dense optimization? If so, have the distributed
>>> algorithms advanced enough to make this indirection unnecessary?
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com

Re: RecommenderJob uses indirection for ItemIDs

Posted by Lance Norskog <go...@gmail.com>.

Ah! So if it was a sparse vector it could be indexed directly. Or the
mapping could be with a hash-indexed representation as used with
Lucene vectors.

On Sun, Jun 12, 2011 at 3:43 AM, Sean Owen <sr...@gmail.com> wrote:
> The keys have to be hashed to be used as int offsets into a vector. While
> loading the mapping isn't ideal it does only scale as the number of items
> and users.
>  On Jun 12, 2011 3:47 AM, "Lance Norskog" <go...@gmail.com> wrote:
>> The RecommenderJob makes a "side" file which maps a fabricated integer
>> index to a long ItemID. Why is this needed? Couldn't the
>> RecommenderJob propagate the long ItemID directly? Note that this
>> forces all instances of AggregateAndReduceRecommender to load the
>> entire map. Part of the Map/Reduce rules are 'nothing needs to know
>> everything'.
>>
>> Is this a sparse/dense optimization? If so, have the distributed
>> algorithms advanced enough to make this indirection unnecessary?
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>



-- 
Lance Norskog
goksron@gmail.com

Re: RecommenderJob uses indirection for ItemIDs

Posted by Sean Owen <sr...@gmail.com>.

The keys have to be hashed to be used as int offsets into a vector. While
loading the mapping isn't ideal it does only scale as the number of items
and users.
 On Jun 12, 2011 3:47 AM, "Lance Norskog" <go...@gmail.com> wrote:
> The RecommenderJob makes a "side" file which maps a fabricated integer
> index to a long ItemID. Why is this needed? Couldn't the
> RecommenderJob propagate the long ItemID directly? Note that this
> forces all instances of AggregateAndReduceRecommender to load the
> entire map. Part of the Map/Reduce rules are 'nothing needs to know
> everything'.
>
> Is this a sparse/dense optimization? If so, have the distributed
> algorithms advanced enough to make this indirection unnecessary?
>
> --
> Lance Norskog
> goksron@gmail.com