You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by James Li <ja...@gmail.com> on 2011/11/21 17:26:44 UTC

issue about large number of items to recommend

Hi,

I was wondering if anybody has dealt with the issue where your recommender
system has to deal with a really large number of items which can be
recommended, say 10 millions. It would be impractical for the recommender
to predict a rating on every single items before ranking them. Can anybody
point me to any papers or links for a solution?

This issue also causes some problem for performance tests if we adopt the
rank-based measure such as Precision@5. If I want to use this measure
Precision@#n to test a recommender system where there are a large number of
items to recommend, the likelihood of an item consumed by a user getting
into the top #n list should be really low. Any suggestions as to how to
handle this case?

Thanks,

James

Re: issue about large number of items to recommend

Posted by Sean Owen <sr...@gmail.com>.

Do you have relatively few users? a user-user-similarity-based algorithm
would be a lot faster then.

I'm guessing that the number of items is unusually large relative to the
number of actual user-item interactions you might otherwise expect -- that
it's very sparse? Matrix-factorization techniques will probably do well
here, since they'll squeeze out a lot of the problems of accuracy and scale
that come with very sparse data.

Yes a precision test has the problem you described, even though that's a
general problem and not specific to this situation. It's just very hard to
define a "relevant" vs "non-relevant" item. Most items will be considered
non-relevant by default even though that's not true.

On Mon, Nov 21, 2011 at 4:26 PM, James Li <ja...@gmail.com> wrote:

> Hi,
>
> I was wondering if anybody has dealt with the issue where your recommender
> system has to deal with a really large number of items which can be
> recommended, say 10 millions. It would be impractical for the recommender
> to predict a rating on every single items before ranking them. Can anybody
> point me to any papers or links for a solution?
>
> This issue also causes some problem for performance tests if we adopt the
> rank-based measure such as Precision@5. If I want to use this measure
> Precision@#n to test a recommender system where there are a large number
> of
> items to recommend, the likelihood of an item consumed by a user getting
> into the top #n list should be really low. Any suggestions as to how to
> handle this case?
>
> Thanks,
>
> James
>

Re: issue about large number of items to recommend

Posted by Ted Dunning <te...@gmail.com>.

You may also want to move more towards content based recommendations.
 Essentially what that means is that you recommend characteristics of items
and then do a search with the recommended characteristics as a query to
find the recommended items.

As a bonus, you can also learn the degree of association between
characteristics and items which helps the system downgrade spammers.

On Mon, Nov 21, 2011 at 4:57 PM, Sebastian Schelter <ss...@apache.org> wrote:

>
> > It would be impractical for the recommender
> > to predict a rating on every single items before ranking them.
>
> In the standard item-based approach only items that are similar to the
> ones that the user has interacted with need to be taken into account in
> the recommendation phase. So you don't have to look at all 10 million
> items using this approach.
>
> --sebastian
>

Re: issue about large number of items to recommend

Posted by Sebastian Schelter <ss...@apache.org>.

> It would be impractical for the recommender
> to predict a rating on every single items before ranking them.

In the standard item-based approach only items that are similar to the
ones that the user has interacted with need to be taken into account in
the recommendation phase. So you don't have to look at all 10 million
items using this approach.

--sebastian