You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Emilio Suarez <Em...@intela.com> on 2012/05/11 19:18:48 UTC

Recommender with ratings takes a long time to process

Hi there,

The usual setting for the Mahout recommendation input file is:
user, item, rating

Now, for the purposes of my application, what I really wanted was a recommendation of users for a specific item, so my input files are:
item, user, rating

My input CSV file contains the following stats:

model file: 560,901 records
item "24441": 31,585 records
rating contains one of 3 values: 1, 2 or 3

When I ask for a recommendation of users for item "24441", these are the results:

total recommended "users": 50,162
Elapsed time: 3h 13m

As you can see… this is a very long time processing…  and this all started when I added "ratings" to the input files.
Before I was using the recommender with GenericBooleanPrefItemBasedRecommender, and the process would run in minutes.
Now with the ratings, I am using the following:

        LogLikelihoodSimilarity similarity = new LogLikelihoodSimilarity(fileDataModel);
        AllSimilarItemsCandidateItemsStrategy candidateStrategy = new AllSimilarItemsCandidateItemsStrategy(similarity);
        recommender = new GenericItemBasedRecommender(fileDataModel, similarity, candidateStrategy, candidateStrategy);

I have another input file with the following stats:

model file: 276,543 records
item "11205": 5,968 records
rating contains one of 3 values: 1, 2 or 3

and when I ask for a recommendation of users for item "11205", these are the results:

total recommended "users": 26,083
Elapsed time: 23m

As you can see, the difference is size is just 2x, but the time difference is 8x !!!

Is this the expected behavior for the recommender to take this long?
Is there anything I can do to speed up the process?

Thanks

-emilio

Re: Recommender with ratings takes a long time to process

Posted by Emilio Suarez <Em...@intela.com>.
Great!,  trying that now… thanks again Sean!

-emilio
On May 11, 2012, at 11:50 AM, Sean Owen wrote:

Yes, you want the sampling one so you can reduce the number of
neighbors you consider.

On Fri, May 11, 2012 at 6:47 PM, Emilio Suarez <Em...@intela.com>> wrote:
Thanks Sean,

So, do you suggest something like this?

       LogLikelihoodSimilarity similarity = new LogLikelihoodSimilarity(fileDataModel);
       PreferredItemsNeighborhoodCandidateItemsStrategy candidateStrategy = new PreferredItemsNeighborhoodCandidateItemsStrategy();
       recommender = new GenericItemBasedRecommender(fileDataModel, similarity, candidateStrategy, candidateStrategy);

or this?

       LogLikelihoodSimilarity similarity = new LogLikelihoodSimilarity(fileDataModel);
       SamplingCandidateItemsStrategy candidateStrategy = new SamplingCandidateItemsStrategy();
       recommender = new GenericItemBasedRecommender(fileDataModel, similarity, candidateStrategy, candidateStrategy);


-emilio

You need to apply a CandidateItemStrategy to reduce the number of
elements you consider, or else it will take a very long time because
almost the entire model is a candidate for recommendation.



Re: Recommender with ratings takes a long time to process

Posted by Sean Owen <sr...@gmail.com>.
Yes, you want the sampling one so you can reduce the number of
neighbors you consider.

On Fri, May 11, 2012 at 6:47 PM, Emilio Suarez <Em...@intela.com> wrote:
> Thanks Sean,
>
> So, do you suggest something like this?
>
>        LogLikelihoodSimilarity similarity = new LogLikelihoodSimilarity(fileDataModel);
>        PreferredItemsNeighborhoodCandidateItemsStrategy candidateStrategy = new PreferredItemsNeighborhoodCandidateItemsStrategy();
>        recommender = new GenericItemBasedRecommender(fileDataModel, similarity, candidateStrategy, candidateStrategy);
>
> or this?
>
>        LogLikelihoodSimilarity similarity = new LogLikelihoodSimilarity(fileDataModel);
>        SamplingCandidateItemsStrategy candidateStrategy = new SamplingCandidateItemsStrategy();
>        recommender = new GenericItemBasedRecommender(fileDataModel, similarity, candidateStrategy, candidateStrategy);
>
>
> -emilio
>
> You need to apply a CandidateItemStrategy to reduce the number of
> elements you consider, or else it will take a very long time because
> almost the entire model is a candidate for recommendation.
>

Re: Recommender with ratings takes a long time to process

Posted by Emilio Suarez <Em...@intela.com>.
Thanks Sean,

So, do you suggest something like this?

        LogLikelihoodSimilarity similarity = new LogLikelihoodSimilarity(fileDataModel);
        PreferredItemsNeighborhoodCandidateItemsStrategy candidateStrategy = new PreferredItemsNeighborhoodCandidateItemsStrategy();
        recommender = new GenericItemBasedRecommender(fileDataModel, similarity, candidateStrategy, candidateStrategy);

or this?

        LogLikelihoodSimilarity similarity = new LogLikelihoodSimilarity(fileDataModel);
        SamplingCandidateItemsStrategy candidateStrategy = new SamplingCandidateItemsStrategy();
        recommender = new GenericItemBasedRecommender(fileDataModel, similarity, candidateStrategy, candidateStrategy);


-emilio

You need to apply a CandidateItemStrategy to reduce the number of
elements you consider, or else it will take a very long time because
almost the entire model is a candidate for recommendation.

On Fri, May 11, 2012 at 6:18 PM, Emilio Suarez <Em...@intela.com>> wrote:
Hi there,

The usual setting for the Mahout recommendation input file is:
user, item, rating

Now, for the purposes of my application, what I really wanted was a recommendation of users for a specific item, so my input files are:
item, user, rating

My input CSV file contains the following stats:

model file: 560,901 records
item "24441": 31,585 records
rating contains one of 3 values: 1, 2 or 3

When I ask for a recommendation of users for item "24441", these are the results:

total recommended "users": 50,162
Elapsed time: 3h 13m

As you can see… this is a very long time processing…  and this all started when I added "ratings" to the input files.
Before I was using the recommender with GenericBooleanPrefItemBasedRecommender, and the process would run in minutes.
Now with the ratings, I am using the following:

       LogLikelihoodSimilarity similarity = new LogLikelihoodSimilarity(fileDataModel);
       AllSimilarItemsCandidateItemsStrategy candidateStrategy = new AllSimilarItemsCandidateItemsStrategy(similarity);
       recommender = new GenericItemBasedRecommender(fileDataModel, similarity, candidateStrategy, candidateStrategy);

I have another input file with the following stats:

model file: 276,543 records
item "11205": 5,968 records
rating contains one of 3 values: 1, 2 or 3

and when I ask for a recommendation of users for item "11205", these are the results:

total recommended "users": 26,083
Elapsed time: 23m

As you can see, the difference is size is just 2x, but the time difference is 8x !!!

Is this the expected behavior for the recommender to take this long?
Is there anything I can do to speed up the process?

Thanks

-emilio


Re: Recommender with ratings takes a long time to process

Posted by Sean Owen <sr...@gmail.com>.
You need to apply a CandidateItemStrategy to reduce the number of
elements you consider, or else it will take a very long time because
almost the entire model is a candidate for recommendation.

On Fri, May 11, 2012 at 6:18 PM, Emilio Suarez <Em...@intela.com> wrote:
> Hi there,
>
> The usual setting for the Mahout recommendation input file is:
> user, item, rating
>
> Now, for the purposes of my application, what I really wanted was a recommendation of users for a specific item, so my input files are:
> item, user, rating
>
> My input CSV file contains the following stats:
>
> model file: 560,901 records
> item "24441": 31,585 records
> rating contains one of 3 values: 1, 2 or 3
>
> When I ask for a recommendation of users for item "24441", these are the results:
>
> total recommended "users": 50,162
> Elapsed time: 3h 13m
>
> As you can see… this is a very long time processing…  and this all started when I added "ratings" to the input files.
> Before I was using the recommender with GenericBooleanPrefItemBasedRecommender, and the process would run in minutes.
> Now with the ratings, I am using the following:
>
>        LogLikelihoodSimilarity similarity = new LogLikelihoodSimilarity(fileDataModel);
>        AllSimilarItemsCandidateItemsStrategy candidateStrategy = new AllSimilarItemsCandidateItemsStrategy(similarity);
>        recommender = new GenericItemBasedRecommender(fileDataModel, similarity, candidateStrategy, candidateStrategy);
>
> I have another input file with the following stats:
>
> model file: 276,543 records
> item "11205": 5,968 records
> rating contains one of 3 values: 1, 2 or 3
>
> and when I ask for a recommendation of users for item "11205", these are the results:
>
> total recommended "users": 26,083
> Elapsed time: 23m
>
> As you can see, the difference is size is just 2x, but the time difference is 8x !!!
>
> Is this the expected behavior for the recommender to take this long?
> Is there anything I can do to speed up the process?
>
> Thanks
>
> -emilio