You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by 胡仲义 <su...@gmail.com> on 2012/09/30 14:07:11 UTC

The performance of mahout's recommender.

*Hi, I am a mahout user and I am confused by the performance of mahout's
recommender.*
*
*
*I have a prefrence data set of an e-commerce platform, and each line of
the data file represents a single prefrence in the form of
userID,itemID,rating value. The input is 7.8GB as a text file, and contains
3,70,250,381 lines of user-item-prefrence associations, from 1,32,598,906
user to 35,920,654 distinct items. I use mahout to recommend 10 items for
each user with **org.apache.mahout.cf.taste.hadoop.item.RecommenderJob on
hadoop clusters with 250 Linux servers. The command is as follow:*
*
*
*$./mahout org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -i
input/input.txt -o  output -s SIMILARITY_LOGLIKELIHOOD --usersFile
input/users.txt --numRecommendations 10   --tempDir temp


                   *
*
*
*However, the performance let me down, it took 23 hours to get the result.
 I want to know is it normal or there are some methods can improve the
performance.*
*
*
*thanks.*
*
*
*--Hu Zhy*
*
*

Re: The performance of mahout's recommender.

Posted by Sebastian Schelter <ss...@apache.org>.

Hello Hu,

the performance here depends very much on the distribution of ratings
towards items, furthermore you have an extremely high number of items
which makes it hard to use an item-based approach.

With a blocksize of 128MB, 7.8GB correspond to 63 blocks, so are you
sure you really leverage all machines? How long did each of the
MapReduce steps of the job take, when did you kill it?

The parameter maxPrefsPerUserInItemSimilarity (with a default value of
1000) determines how many observations to take into account per user,
setting this to a smaller value drastically increases performance. This
should be the first thing to play with.

Best,
Sebastian

On 30.09.2012 14:07, 胡仲义 wrote:
> *Hi, I am a mahout user and I am confused by the performance of mahout's
> recommender.*
> *
> *
> *I have a prefrence data set of an e-commerce platform, and each line of
> the data file represents a single prefrence in the form of
> userID,itemID,rating value. The input is 7.8GB as a text file, and contains
> 3,70,250,381 lines of user-item-prefrence associations, from 1,32,598,906
> user to 35,920,654 distinct items. I use mahout to recommend 10 items for
> each user with **org.apache.mahout.cf.taste.hadoop.item.RecommenderJob on
> hadoop clusters with 250 Linux servers. The command is as follow:*
> *
> *
> *$./mahout org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -i
> input/input.txt -o  output -s SIMILARITY_LOGLIKELIHOOD --usersFile
> input/users.txt --numRecommendations 10   --tempDir temp
> 
> 
>                    *
> *
> *
> *However, the performance let me down, it took 23 hours to get the result.
>  I want to know is it normal or there are some methods can improve the
> performance.*
> *
> *
> *thanks.*
> *
> *
> *--Hu Zhy*
> *
> *
>