You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2010/04/23 13:32:49 UTC

[jira] Commented: (MAHOUT-305) Combine both cooccurrence-based CF M/R jobs

    [ https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860221#action_12860221 ] 

Sean Owen commented on MAHOUT-305:
----------------------------------

First copying and pasting my comment from the mailing list:

Ankur effectively raised issues about the performance of
org.apache.mahout.cf.taste.hadoop.item by adding
org.apache.mahout.cf.taste.hadoop.cooccurrence, which is a similar
recommender job (item cooccurrence-based) but with a different
implementation. ".item" ultimately does not distribute the matrix-user
vector multiply, and ".coocurrence" highly distributes it.

.item accomplished this by side-loading the co-occurrence matrix into
a reducer, by accessing it from disk as MapFiles. This way of
accessing columns proved to be very slow.

After much experimentation, I've completely overhauled .item by
grafting in ideas from .cooccurrence. It is a sort of
best-of-both-worlds hybrid of the two. It borrows a clever way to join
two kinds of input into one MapReduce, in order to join the
co-occurrence matrix columns and individual elements of each user
vector. The product is output and recombined later. This hybrid
retains features of .item like accommodating user ratings.

Letting Hadoop manage the data flow, even though it takes a bit more
copying, avoiding reading from MapFile in a random-access manner,
using features like the Combiner, and being smarter about Writables
has sped this up for me by at least a factor of 10 -- mostly that
avoiding MapFiles.

> Combine both cooccurrence-based CF M/R jobs
> -------------------------------------------
>
>                 Key: MAHOUT-305
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-305
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Ankur
>            Priority: Minor
>
> We have two different but essentially identical MapReduce jobs to make recommendations based on item co-occurrence: org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be merged. Not sure exactly how to approach that but noting this in JIRA, per Ankur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.