You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Stanley Ipkiss <sa...@gmail.com> on 2010/08/26 00:01:44 UTC

1st MapReduce job in RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item)

In RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item), what is the
primary purpose of the first map reduce job? This is the one that I am
talking about -

    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
      Job itemIDIndex = prepareJob(
        inputPath, itemIDIndexPath, TextInputFormat.class,
        ItemIDIndexMapper.class, VarIntWritable.class,
VarLongWritable.class,
        ItemIDIndexReducer.class, VarIntWritable.class,
VarLongWritable.class,
        SequenceFileOutputFormat.class);
      itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
      itemIDIndex.waitForCompletion(true);
    }

It seems to me that the mapper just outputs int based keys for the item/user
long ids, and the reducer just finds the least user/item id within each
index. Do we just want to find the lowest id in our complete dataset, for
which we end up spinning a complete map reduce job?  
-- 
View this message in context: http://lucene.472066.n3.nabble.com/1st-MapReduce-job-in-RecommenderJob-java-org-apache-mahout-cf-taste-hadoop-item-tp1342081p1342081.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: 1st MapReduce job in RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item)

Posted by Stanley Ipkiss <sa...@gmail.com>.

Thanks guys! That clears my doubt.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/1st-MapReduce-job-in-RecommenderJob-java-org-apache-mahout-cf-taste-hadoop-item-tp1342081p1358100.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: 1st MapReduce job in RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item)

Posted by Sean Owen <sr...@gmail.com>.

The issue is that user and item IDs may be longs, but they are used as
indexes into a vector, which are ints. This does a hashing and stores
that mapping, so it can be reversed at the end. For the reverse
mapping, in the case of collision, the lowest long key wins. And, the
hash also has the nice property of being the identity mapping for
values <= Integer.MAX_VALUE.

Sean

On Wed, Aug 25, 2010 at 11:01 PM, Stanley Ipkiss
<sa...@gmail.com> wrote:
>
> In RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item), what is the
> primary purpose of the first map reduce job? This is the one that I am
> talking about -
>
>    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>      Job itemIDIndex = prepareJob(
>        inputPath, itemIDIndexPath, TextInputFormat.class,
>        ItemIDIndexMapper.class, VarIntWritable.class,
> VarLongWritable.class,
>        ItemIDIndexReducer.class, VarIntWritable.class,
> VarLongWritable.class,
>        SequenceFileOutputFormat.class);
>      itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
>      itemIDIndex.waitForCompletion(true);
>    }
>
> It seems to me that the mapper just outputs int based keys for the item/user
> long ids, and the reducer just finds the least user/item id within each
> index. Do we just want to find the lowest id in our complete dataset, for
> which we end up spinning a complete map reduce job?
> --
> View this message in context: http://lucene.472066.n3.nabble.com/1st-MapReduce-job-in-RecommenderJob-java-org-apache-mahout-cf-taste-hadoop-item-tp1342081p1342081.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Re: 1st MapReduce job in RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item)

Posted by Sebastian Schelter <ss...@apache.org>.

Hi Stanley,

The IDs for items are expected to be longs in the input data. When we
convert the input to vectors these longs have to be mapped to ints. The
first mapreduce job you're talking about is storing that mapping so that
these ints can be mapped back to the original long IDs later in the final
recommendation step.

--sebastian


2010/8/26 Stanley Ipkiss <sa...@gmail.com>

>
> In RecommenderJob.java (org.apache.mahout.cf.taste.hadoop.item), what is
> the
> primary purpose of the first map reduce job? This is the one that I am
> talking about -
>
>    if (shouldRunNextPhase(parsedArgs, currentPhase)) {
>      Job itemIDIndex = prepareJob(
>        inputPath, itemIDIndexPath, TextInputFormat.class,
>        ItemIDIndexMapper.class, VarIntWritable.class,
> VarLongWritable.class,
>        ItemIDIndexReducer.class, VarIntWritable.class,
> VarLongWritable.class,
>        SequenceFileOutputFormat.class);
>      itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
>      itemIDIndex.waitForCompletion(true);
>    }
>
> It seems to me that the mapper just outputs int based keys for the
> item/user
> long ids, and the reducer just finds the least user/item id within each
> index. Do we just want to find the lowest id in our complete dataset, for
> which we end up spinning a complete map reduce job?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/1st-MapReduce-job-in-RecommenderJob-java-org-apache-mahout-cf-taste-hadoop-item-tp1342081p1342081.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>