You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Juan José Ramos <jj...@gmail.com> on 2014/03/04 10:16:18 UTC

Recommend items not rated by any user

First thing is thatI know this requirement would not make sense in a CF
Recommender. In my case, I am trying to use Mahout to create something
closer to a Content-Based Recommender.

In particular, I am pre-computing a similarity matrix between all the
documents (items) of my catalogue and using that matrix as the
ItemSimilarity for my Item-Based Recommender.

So, when a user rates a document, how could I make the recommender outputs
similar documents to that ones the user has already rated even if no other
user in the system has rated them yet? Is that even possible in the first
place?

Thanks a lot.

Re: Recommend items not rated by any user

Posted by Sebastian Schelter <ss...@apache.org>.
For SVD based algorithms, you would should use the AllUnknownItems 
Strategy then, thats correct.

In the majority of industry usecases that I have seen, people use 
pre-computed item similarities (Mahout has lots of machinery for doing 
this, btw), so AllSimilarItems totally makes sense there.

--sebastian

On 03/05/2014 06:01 PM, Tevfik Aytekin wrote:
> It can even make things worse in SVD-based algorithms for which
> preference estimation is very fast.
>
> On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin <te...@gmail.com> wrote:
>> Hi Sebastian,
>> But in order not to select items that is not similar to at least one
>> of the items the user interacted with you have to compute the
>> similarity with all user items (which is the main task for estimating
>> the preference of an item in item-based method). So, it seems to me
>> that AllSimilarItemsStrategy does not bring much advantage over
>> AllUnknownItemsCandidateItemsStrategy.
>>
>> On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter <ss...@apache.org> wrote:
>>>> So both strategies seems to be effectively the same, I don't know what
>>>> the implementers had in mind when designing
>>>> AllSimilarItemsCandidateItemsStrategy.
>>>
>>> It can take a long time to estimate preferences for all items a user doesn't
>>> know. Especially if you have a lot of items. Traditional item-based
>>> recommenders will not recommend any item that is not similar to at least one
>>> of the items the user interacted with, so AllSimilarItemsStrategy already
>>> selects the maximum set of items that could be potentially recommended to
>>> the user.
>>>
>>> --sebastian
>>>
>>>
>>>
>>>
>>> On 03/05/2014 05:38 PM, Tevfik Aytekin wrote:
>>>>
>>>> If the similarity between item 5 and two of the items user 1 preferred are
>>>> not
>>>> NaN then it will return 1, that is what I'm saying. If the
>>>> similarities were all NaN then
>>>> it will not return it.
>>>>
>>>> But surely, you might wonder if all similarities between an item and
>>>> user's items are NaN, then
>>>> AllUnknownItemsCandidateItemsStrategy probably will not return it.
>>>>
>>>
>>>> On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos <jj...@gmail.com> wrote:
>>>>>
>>>>> @Tevfik, running this recommender:
>>>>>
>>>>> GenericItemBasedRecommender itemRecommender = new
>>>>> GenericItemBasedRecommender(dataModel, itemSimilarity, new
>>>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new
>>>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity));
>>>>>
>>>>>
>>>>> With this dataModel:
>>>>> 1,1,1.0
>>>>> 1,2,2.0
>>>>> 1,3,1.0
>>>>> 1,4,2.0
>>>>> 2,1,1.0
>>>>> 2,2,4.0
>>>>>
>>>>>
>>>>> And these similarities
>>>>> 1,2,0.1
>>>>> 1,3,0.2
>>>>> 1,4,0.3
>>>>> 2,3,0.5
>>>>> 3,4,0.5
>>>>> 5,1,0.2
>>>>> 5,2,1.0
>>>>>
>>>>> Returns item 5 for User 1. So item 5 has not been preferred by user 1,
>>>>> and
>>>>> the similarity between item 5 and two of the items user 1 preferred are
>>>>> not
>>>>> NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item.
>>>>> So,
>>>>> I'm truly sorry to insist on this, but I still really do not get the
>>>>> difference.
>>>>>
>>>>>
>>>>> On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin
>>>>> <te...@gmail.com>wrote:
>>>>>
>>>>>> Juan,
>>>>>> You got me wrong,
>>>>>>
>>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>>
>>>>>> returns all items that have not been rated by the user and the
>>>>>> similarity metric returns a non-NaN similarity value with at
>>>>>> least one of the items preferred by the user.
>>>>>>
>>>>>> So, it does not simply return all items that have not been rated by
>>>>>> the user. For example, if there is an item X which has not been rated
>>>>>> by the user and if the similarity value between X and at least one of
>>>>>> the items rated (preferred) by the user is not NaN, then X will be not
>>>>>> be returned by AllSimilarItemsCandidateItemsStrategy, but it will be
>>>>>> returned by AllUnknownItemsCandidateItemsStrategy.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jj...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Tefik,
>>>>>>>
>>>>>>> Thanks for the response. I think what you says contradicts what
>>>>>>> Sebastian
>>>>>>> pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy
>>>>>>
>>>>>> returns
>>>>>>>
>>>>>>> all items that have not been rated by the user, what would
>>>>>>> AllUnknownItemsCandidateItemsStrategy return?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin
>>>>>>> <tevfik.aytekin@gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sorry there was a typo in the previous paragraph.
>>>>>>>>
>>>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>>>
>>>>>>>> returns all items that have not been rated by the user and the
>>>>>>>> similarity metric returns a non-NaN similarity value with at
>>>>>>>> least one of the items preferred by the user.
>>>>>>>>
>>>>>>>> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <
>>>>>>
>>>>>> tevfik.aytekin@gmail.com>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Juan,
>>>>>>>>>
>>>>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>>>>
>>>>>>>>> returns all items that have not been rated by the user and the
>>>>>>>>> similarity metric returns a non-NaN similarity value that is with at
>>>>>>>>> least one of the items preferred by the user.
>>>>>>>>>
>>>>>>>>> Tevfik
>>>>>>>>>
>>>>>>>>> On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the reply, Sebastian.
>>>>>>>>>>>
>>>>>>>>>>> I am not sure if that should be implemented in the Abstract base
>>>>>>
>>>>>> class
>>>>>>>>>>>
>>>>>>>>>>> though because for
>>>>>>>>>>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by
>>>>>>>>
>>>>>>>> definition,
>>>>>>>>>>>
>>>>>>>>>>> it returns the item not rated by the user and rated by somebody
>>>>>>
>>>>>> else.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Good point. So we seem to need special implementations.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Back to my last post, I have been playing around with
>>>>>>>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>>>>>>> and AllUnknownItemsCandidateItemsStrategy, and although they both
>>>>>>>>>>> do
>>>>>>>>
>>>>>>>> what
>>>>>>>>>>>
>>>>>>>>>>> I
>>>>>>>>>>> wanted (recommend items not previously rated by any user), I
>>>>>>
>>>>>> honestly
>>>>>>>>>>>
>>>>>>>>>>> can't
>>>>>>>>>>> tell the difference between the two strategies. In my tests the
>>>>>>
>>>>>> output
>>>>>>>>
>>>>>>>> was
>>>>>>>>>>>
>>>>>>>>>>> always the same. If the eventual output of the recommender will not
>>>>>>>>>>> include
>>>>>>>>>>> items already rated by the user as pointed out here (
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
>>>>>>>>
>>>>>>>> ),
>>>>>>>>>>>
>>>>>>>>>>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
>>>>>>>>>>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> AllSimilarItems returns all items that are similar to any item that
>>>>>>
>>>>>> the
>>>>>>>>
>>>>>>>> user
>>>>>>>>>>
>>>>>>>>>> already knows. AllUnknownItems simply returns all items that the
>>>>>>>>>> user
>>>>>>>>
>>>>>>>> has
>>>>>>>>>>
>>>>>>>>>> not interacted with yet.
>>>>>>>>>>
>>>>>>>>>> These are two different things, although they might overlap in some
>>>>>>>>>> scenarios.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Sebastian
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ssc@apache.org
>>>>>>>
>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Juan,
>>>>>>>>>>>>
>>>>>>>>>>>> that is a good catch. CandidateItemsStrategy is the right place to
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> implement this. Maybe we should simply extend its interface to add
>>>>>>>>>>> a
>>>>>>>>>>> parameter that says whether to keep or remove the current users
>>>>>>
>>>>>> items?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> We could even do this in the abstract base class then.
>>>>>>>>>>>>
>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In case somebody runs into the same situation, the key seems to
>>>>>>
>>>>>> be in
>>>>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>>> CandidateItemStrategy being passed to the constructor
>>>>>>>>>>>>> of GenericItemBasedRecommender. Looking into the code, if no
>>>>>>>>>>>>> CandidateItemStrategy is specified in the
>>>>>>>>>>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is
>>>>>>
>>>>>> used
>>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>>>>>>>>> as the documentation says, the doGetCandidateItems method:
>>>>>>
>>>>>> "returns
>>>>>>>>
>>>>>>>> all
>>>>>>>>>>>>>
>>>>>>>>>>>>> items that have not been rated by the user and that were
>>>>>>
>>>>>> preferred by
>>>>>>>>>>>>>
>>>>>>>>>>>>> another user that has preferred at least one item that the
>>>>>>>>>>>>> current
>>>>>>>>
>>>>>>>> user
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> has
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> preferred too".
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, a different CandidateItemStrategy needs to be passed. For
>>>>>>>>>>>>> this
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> problem,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>>>>>>>>>>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does
>>>>>>>>
>>>>>>>> anybody
>>>>>>>>>>>>>
>>>>>>>>>>>>> know where to find some documentation about the different
>>>>>>>>>>>>> CandidateItemStrategy? Based on the name I would say that:
>>>>>>>>>>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar
>>>>>>>>>>>>> items
>>>>>>>>>>>>> regardless of whether they have been already rated by someone or
>>>>>>
>>>>>> not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar
>>>>>>>>>>>>> items
>>>>>>>>
>>>>>>>> that
>>>>>>>>>>>>>
>>>>>>>>>>>>> have not been rated by anyone yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does anybody know if it works like that?
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <
>>>>>>
>>>>>> jjarmos@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> First thing is thatI know this requirement would not make sense
>>>>>>
>>>>>> in
>>>>>>>>
>>>>>>>> a CF
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Recommender. In my case, I am trying to use Mahout to create
>>>>>>>>
>>>>>>>> something
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> closer to a Content-Based Recommender.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In particular, I am pre-computing a similarity matrix between
>>>>>>>>>>>>>> all
>>>>>>>>
>>>>>>>> the
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> documents (items) of my catalogue and using that matrix as the
>>>>>>>>>>>>>> ItemSimilarity for my Item-Based Recommender.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So, when a user rates a document, how could I make the
>>>>>>
>>>>>> recommender
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> outputs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> similar documents to that ones the user has already rated even
>>>>>>
>>>>>> if no
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> other
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> user in the system has rated them yet? Is that even possible in
>>>>>>
>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> first
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> place?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks a lot.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>


Re: Recommend items not rated by any user

Posted by Tevfik Aytekin <te...@gmail.com>.
It can even make things worse in SVD-based algorithms for which
preference estimation is very fast.

On Wed, Mar 5, 2014 at 7:00 PM, Tevfik Aytekin <te...@gmail.com> wrote:
> Hi Sebastian,
> But in order not to select items that is not similar to at least one
> of the items the user interacted with you have to compute the
> similarity with all user items (which is the main task for estimating
> the preference of an item in item-based method). So, it seems to me
> that AllSimilarItemsStrategy does not bring much advantage over
> AllUnknownItemsCandidateItemsStrategy.
>
> On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter <ss...@apache.org> wrote:
>>> So both strategies seems to be effectively the same, I don't know what
>>> the implementers had in mind when designing
>>> AllSimilarItemsCandidateItemsStrategy.
>>
>> It can take a long time to estimate preferences for all items a user doesn't
>> know. Especially if you have a lot of items. Traditional item-based
>> recommenders will not recommend any item that is not similar to at least one
>> of the items the user interacted with, so AllSimilarItemsStrategy already
>> selects the maximum set of items that could be potentially recommended to
>> the user.
>>
>> --sebastian
>>
>>
>>
>>
>> On 03/05/2014 05:38 PM, Tevfik Aytekin wrote:
>>>
>>> If the similarity between item 5 and two of the items user 1 preferred are
>>> not
>>> NaN then it will return 1, that is what I'm saying. If the
>>> similarities were all NaN then
>>> it will not return it.
>>>
>>> But surely, you might wonder if all similarities between an item and
>>> user's items are NaN, then
>>> AllUnknownItemsCandidateItemsStrategy probably will not return it.
>>>
>>
>>> On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos <jj...@gmail.com> wrote:
>>>>
>>>> @Tevfik, running this recommender:
>>>>
>>>> GenericItemBasedRecommender itemRecommender = new
>>>> GenericItemBasedRecommender(dataModel, itemSimilarity, new
>>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new
>>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity));
>>>>
>>>>
>>>> With this dataModel:
>>>> 1,1,1.0
>>>> 1,2,2.0
>>>> 1,3,1.0
>>>> 1,4,2.0
>>>> 2,1,1.0
>>>> 2,2,4.0
>>>>
>>>>
>>>> And these similarities
>>>> 1,2,0.1
>>>> 1,3,0.2
>>>> 1,4,0.3
>>>> 2,3,0.5
>>>> 3,4,0.5
>>>> 5,1,0.2
>>>> 5,2,1.0
>>>>
>>>> Returns item 5 for User 1. So item 5 has not been preferred by user 1,
>>>> and
>>>> the similarity between item 5 and two of the items user 1 preferred are
>>>> not
>>>> NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item.
>>>> So,
>>>> I'm truly sorry to insist on this, but I still really do not get the
>>>> difference.
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin
>>>> <te...@gmail.com>wrote:
>>>>
>>>>> Juan,
>>>>> You got me wrong,
>>>>>
>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>
>>>>> returns all items that have not been rated by the user and the
>>>>> similarity metric returns a non-NaN similarity value with at
>>>>> least one of the items preferred by the user.
>>>>>
>>>>> So, it does not simply return all items that have not been rated by
>>>>> the user. For example, if there is an item X which has not been rated
>>>>> by the user and if the similarity value between X and at least one of
>>>>> the items rated (preferred) by the user is not NaN, then X will be not
>>>>> be returned by AllSimilarItemsCandidateItemsStrategy, but it will be
>>>>> returned by AllUnknownItemsCandidateItemsStrategy.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jj...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi Tefik,
>>>>>>
>>>>>> Thanks for the response. I think what you says contradicts what
>>>>>> Sebastian
>>>>>> pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy
>>>>>
>>>>> returns
>>>>>>
>>>>>> all items that have not been rated by the user, what would
>>>>>> AllUnknownItemsCandidateItemsStrategy return?
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin
>>>>>> <tevfik.aytekin@gmail.com
>>>>>> wrote:
>>>>>>
>>>>>>> Sorry there was a typo in the previous paragraph.
>>>>>>>
>>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>>
>>>>>>> returns all items that have not been rated by the user and the
>>>>>>> similarity metric returns a non-NaN similarity value with at
>>>>>>> least one of the items preferred by the user.
>>>>>>>
>>>>>>> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <
>>>>>
>>>>> tevfik.aytekin@gmail.com>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Juan,
>>>>>>>>
>>>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>>>
>>>>>>>> returns all items that have not been rated by the user and the
>>>>>>>> similarity metric returns a non-NaN similarity value that is with at
>>>>>>>> least one of the items preferred by the user.
>>>>>>>>
>>>>>>>> Tevfik
>>>>>>>>
>>>>>>>> On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks for the reply, Sebastian.
>>>>>>>>>>
>>>>>>>>>> I am not sure if that should be implemented in the Abstract base
>>>>>
>>>>> class
>>>>>>>>>>
>>>>>>>>>> though because for
>>>>>>>>>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by
>>>>>>>
>>>>>>> definition,
>>>>>>>>>>
>>>>>>>>>> it returns the item not rated by the user and rated by somebody
>>>>>
>>>>> else.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Good point. So we seem to need special implementations.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Back to my last post, I have been playing around with
>>>>>>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>>>>>> and AllUnknownItemsCandidateItemsStrategy, and although they both
>>>>>>>>>> do
>>>>>>>
>>>>>>> what
>>>>>>>>>>
>>>>>>>>>> I
>>>>>>>>>> wanted (recommend items not previously rated by any user), I
>>>>>
>>>>> honestly
>>>>>>>>>>
>>>>>>>>>> can't
>>>>>>>>>> tell the difference between the two strategies. In my tests the
>>>>>
>>>>> output
>>>>>>>
>>>>>>> was
>>>>>>>>>>
>>>>>>>>>> always the same. If the eventual output of the recommender will not
>>>>>>>>>> include
>>>>>>>>>> items already rated by the user as pointed out here (
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
>>>>>>>
>>>>>>> ),
>>>>>>>>>>
>>>>>>>>>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
>>>>>>>>>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> AllSimilarItems returns all items that are similar to any item that
>>>>>
>>>>> the
>>>>>>>
>>>>>>> user
>>>>>>>>>
>>>>>>>>> already knows. AllUnknownItems simply returns all items that the
>>>>>>>>> user
>>>>>>>
>>>>>>> has
>>>>>>>>>
>>>>>>>>> not interacted with yet.
>>>>>>>>>
>>>>>>>>> These are two different things, although they might overlap in some
>>>>>>>>> scenarios.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Sebastian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ssc@apache.org
>>>>>>
>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Juan,
>>>>>>>>>>>
>>>>>>>>>>> that is a good catch. CandidateItemsStrategy is the right place to
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> implement this. Maybe we should simply extend its interface to add
>>>>>>>>>> a
>>>>>>>>>> parameter that says whether to keep or remove the current users
>>>>>
>>>>> items?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We could even do this in the abstract base class then.
>>>>>>>>>>>
>>>>>>>>>>> --sebastian
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In case somebody runs into the same situation, the key seems to
>>>>>
>>>>> be in
>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>>>>>>>> CandidateItemStrategy being passed to the constructor
>>>>>>>>>>>> of GenericItemBasedRecommender. Looking into the code, if no
>>>>>>>>>>>> CandidateItemStrategy is specified in the
>>>>>>>>>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is
>>>>>
>>>>> used
>>>>>>>>>>>>
>>>>>>>>>>>> and
>>>>>>>>>>>> as the documentation says, the doGetCandidateItems method:
>>>>>
>>>>> "returns
>>>>>>>
>>>>>>> all
>>>>>>>>>>>>
>>>>>>>>>>>> items that have not been rated by the user and that were
>>>>>
>>>>> preferred by
>>>>>>>>>>>>
>>>>>>>>>>>> another user that has preferred at least one item that the
>>>>>>>>>>>> current
>>>>>>>
>>>>>>> user
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> has
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> preferred too".
>>>>>>>>>>>>
>>>>>>>>>>>> So, a different CandidateItemStrategy needs to be passed. For
>>>>>>>>>>>> this
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> problem,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>>>>>>>>>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does
>>>>>>>
>>>>>>> anybody
>>>>>>>>>>>>
>>>>>>>>>>>> know where to find some documentation about the different
>>>>>>>>>>>> CandidateItemStrategy? Based on the name I would say that:
>>>>>>>>>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar
>>>>>>>>>>>> items
>>>>>>>>>>>> regardless of whether they have been already rated by someone or
>>>>>
>>>>> not.
>>>>>>>>>>>>
>>>>>>>>>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar
>>>>>>>>>>>> items
>>>>>>>
>>>>>>> that
>>>>>>>>>>>>
>>>>>>>>>>>> have not been rated by anyone yet.
>>>>>>>>>>>>
>>>>>>>>>>>> Does anybody know if it works like that?
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <
>>>>>
>>>>> jjarmos@gmail.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> First thing is thatI know this requirement would not make sense
>>>>>
>>>>> in
>>>>>>>
>>>>>>> a CF
>>>>>>>>>>>>>
>>>>>>>>>>>>> Recommender. In my case, I am trying to use Mahout to create
>>>>>>>
>>>>>>> something
>>>>>>>>>>>>>
>>>>>>>>>>>>> closer to a Content-Based Recommender.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In particular, I am pre-computing a similarity matrix between
>>>>>>>>>>>>> all
>>>>>>>
>>>>>>> the
>>>>>>>>>>>>>
>>>>>>>>>>>>> documents (items) of my catalogue and using that matrix as the
>>>>>>>>>>>>> ItemSimilarity for my Item-Based Recommender.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So, when a user rates a document, how could I make the
>>>>>
>>>>> recommender
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> outputs
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> similar documents to that ones the user has already rated even
>>>>>
>>>>> if no
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> other
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> user in the system has rated them yet? Is that even possible in
>>>>>
>>>>> the
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> first
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> place?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks a lot.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>

Re: Recommend items not rated by any user

Posted by Tevfik Aytekin <te...@gmail.com>.
Hi Sebastian,
But in order not to select items that is not similar to at least one
of the items the user interacted with you have to compute the
similarity with all user items (which is the main task for estimating
the preference of an item in item-based method). So, it seems to me
that AllSimilarItemsStrategy does not bring much advantage over
AllUnknownItemsCandidateItemsStrategy.

On Wed, Mar 5, 2014 at 6:46 PM, Sebastian Schelter <ss...@apache.org> wrote:
>> So both strategies seems to be effectively the same, I don't know what
>> the implementers had in mind when designing
>> AllSimilarItemsCandidateItemsStrategy.
>
> It can take a long time to estimate preferences for all items a user doesn't
> know. Especially if you have a lot of items. Traditional item-based
> recommenders will not recommend any item that is not similar to at least one
> of the items the user interacted with, so AllSimilarItemsStrategy already
> selects the maximum set of items that could be potentially recommended to
> the user.
>
> --sebastian
>
>
>
>
> On 03/05/2014 05:38 PM, Tevfik Aytekin wrote:
>>
>> If the similarity between item 5 and two of the items user 1 preferred are
>> not
>> NaN then it will return 1, that is what I'm saying. If the
>> similarities were all NaN then
>> it will not return it.
>>
>> But surely, you might wonder if all similarities between an item and
>> user's items are NaN, then
>> AllUnknownItemsCandidateItemsStrategy probably will not return it.
>>
>
>> On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos <jj...@gmail.com> wrote:
>>>
>>> @Tevfik, running this recommender:
>>>
>>> GenericItemBasedRecommender itemRecommender = new
>>> GenericItemBasedRecommender(dataModel, itemSimilarity, new
>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new
>>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity));
>>>
>>>
>>> With this dataModel:
>>> 1,1,1.0
>>> 1,2,2.0
>>> 1,3,1.0
>>> 1,4,2.0
>>> 2,1,1.0
>>> 2,2,4.0
>>>
>>>
>>> And these similarities
>>> 1,2,0.1
>>> 1,3,0.2
>>> 1,4,0.3
>>> 2,3,0.5
>>> 3,4,0.5
>>> 5,1,0.2
>>> 5,2,1.0
>>>
>>> Returns item 5 for User 1. So item 5 has not been preferred by user 1,
>>> and
>>> the similarity between item 5 and two of the items user 1 preferred are
>>> not
>>> NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item.
>>> So,
>>> I'm truly sorry to insist on this, but I still really do not get the
>>> difference.
>>>
>>>
>>> On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin
>>> <te...@gmail.com>wrote:
>>>
>>>> Juan,
>>>> You got me wrong,
>>>>
>>>> AllSimilarItemsCandidateItemsStrategy
>>>>
>>>> returns all items that have not been rated by the user and the
>>>> similarity metric returns a non-NaN similarity value with at
>>>> least one of the items preferred by the user.
>>>>
>>>> So, it does not simply return all items that have not been rated by
>>>> the user. For example, if there is an item X which has not been rated
>>>> by the user and if the similarity value between X and at least one of
>>>> the items rated (preferred) by the user is not NaN, then X will be not
>>>> be returned by AllSimilarItemsCandidateItemsStrategy, but it will be
>>>> returned by AllUnknownItemsCandidateItemsStrategy.
>>>>
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jj...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi Tefik,
>>>>>
>>>>> Thanks for the response. I think what you says contradicts what
>>>>> Sebastian
>>>>> pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy
>>>>
>>>> returns
>>>>>
>>>>> all items that have not been rated by the user, what would
>>>>> AllUnknownItemsCandidateItemsStrategy return?
>>>>>
>>>>>
>>>>> On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin
>>>>> <tevfik.aytekin@gmail.com
>>>>> wrote:
>>>>>
>>>>>> Sorry there was a typo in the previous paragraph.
>>>>>>
>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>
>>>>>> returns all items that have not been rated by the user and the
>>>>>> similarity metric returns a non-NaN similarity value with at
>>>>>> least one of the items preferred by the user.
>>>>>>
>>>>>> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <
>>>>
>>>> tevfik.aytekin@gmail.com>
>>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Juan,
>>>>>>>
>>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>>
>>>>>>> returns all items that have not been rated by the user and the
>>>>>>> similarity metric returns a non-NaN similarity value that is with at
>>>>>>> least one of the items preferred by the user.
>>>>>>>
>>>>>>> Tevfik
>>>>>>>
>>>>>>> On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org>
>>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for the reply, Sebastian.
>>>>>>>>>
>>>>>>>>> I am not sure if that should be implemented in the Abstract base
>>>>
>>>> class
>>>>>>>>>
>>>>>>>>> though because for
>>>>>>>>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by
>>>>>>
>>>>>> definition,
>>>>>>>>>
>>>>>>>>> it returns the item not rated by the user and rated by somebody
>>>>
>>>> else.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Good point. So we seem to need special implementations.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Back to my last post, I have been playing around with
>>>>>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>>>>> and AllUnknownItemsCandidateItemsStrategy, and although they both
>>>>>>>>> do
>>>>>>
>>>>>> what
>>>>>>>>>
>>>>>>>>> I
>>>>>>>>> wanted (recommend items not previously rated by any user), I
>>>>
>>>> honestly
>>>>>>>>>
>>>>>>>>> can't
>>>>>>>>> tell the difference between the two strategies. In my tests the
>>>>
>>>> output
>>>>>>
>>>>>> was
>>>>>>>>>
>>>>>>>>> always the same. If the eventual output of the recommender will not
>>>>>>>>> include
>>>>>>>>> items already rated by the user as pointed out here (
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
>>>>>>
>>>>>> ),
>>>>>>>>>
>>>>>>>>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
>>>>>>>>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> AllSimilarItems returns all items that are similar to any item that
>>>>
>>>> the
>>>>>>
>>>>>> user
>>>>>>>>
>>>>>>>> already knows. AllUnknownItems simply returns all items that the
>>>>>>>> user
>>>>>>
>>>>>> has
>>>>>>>>
>>>>>>>> not interacted with yet.
>>>>>>>>
>>>>>>>> These are two different things, although they might overlap in some
>>>>>>>> scenarios.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Sebastian
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ssc@apache.org
>>>>>
>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Juan,
>>>>>>>>>>
>>>>>>>>>> that is a good catch. CandidateItemsStrategy is the right place to
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> implement this. Maybe we should simply extend its interface to add
>>>>>>>>> a
>>>>>>>>> parameter that says whether to keep or remove the current users
>>>>
>>>> items?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We could even do this in the abstract base class then.
>>>>>>>>>>
>>>>>>>>>> --sebastian
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In case somebody runs into the same situation, the key seems to
>>>>
>>>> be in
>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>> CandidateItemStrategy being passed to the constructor
>>>>>>>>>>> of GenericItemBasedRecommender. Looking into the code, if no
>>>>>>>>>>> CandidateItemStrategy is specified in the
>>>>>>>>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is
>>>>
>>>> used
>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>> as the documentation says, the doGetCandidateItems method:
>>>>
>>>> "returns
>>>>>>
>>>>>> all
>>>>>>>>>>>
>>>>>>>>>>> items that have not been rated by the user and that were
>>>>
>>>> preferred by
>>>>>>>>>>>
>>>>>>>>>>> another user that has preferred at least one item that the
>>>>>>>>>>> current
>>>>>>
>>>>>> user
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> has
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> preferred too".
>>>>>>>>>>>
>>>>>>>>>>> So, a different CandidateItemStrategy needs to be passed. For
>>>>>>>>>>> this
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> problem,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>>>>>>>>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does
>>>>>>
>>>>>> anybody
>>>>>>>>>>>
>>>>>>>>>>> know where to find some documentation about the different
>>>>>>>>>>> CandidateItemStrategy? Based on the name I would say that:
>>>>>>>>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar
>>>>>>>>>>> items
>>>>>>>>>>> regardless of whether they have been already rated by someone or
>>>>
>>>> not.
>>>>>>>>>>>
>>>>>>>>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar
>>>>>>>>>>> items
>>>>>>
>>>>>> that
>>>>>>>>>>>
>>>>>>>>>>> have not been rated by anyone yet.
>>>>>>>>>>>
>>>>>>>>>>> Does anybody know if it works like that?
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <
>>>>
>>>> jjarmos@gmail.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> First thing is thatI know this requirement would not make sense
>>>>
>>>> in
>>>>>>
>>>>>> a CF
>>>>>>>>>>>>
>>>>>>>>>>>> Recommender. In my case, I am trying to use Mahout to create
>>>>>>
>>>>>> something
>>>>>>>>>>>>
>>>>>>>>>>>> closer to a Content-Based Recommender.
>>>>>>>>>>>>
>>>>>>>>>>>> In particular, I am pre-computing a similarity matrix between
>>>>>>>>>>>> all
>>>>>>
>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>> documents (items) of my catalogue and using that matrix as the
>>>>>>>>>>>> ItemSimilarity for my Item-Based Recommender.
>>>>>>>>>>>>
>>>>>>>>>>>> So, when a user rates a document, how could I make the
>>>>
>>>> recommender
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> outputs
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> similar documents to that ones the user has already rated even
>>>>
>>>> if no
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> other
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> user in the system has rated them yet? Is that even possible in
>>>>
>>>> the
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> first
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> place?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks a lot.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>

Re: Recommend items not rated by any user

Posted by Pat Ferrel <pa...@occamsmachete.com>.
I agree. IMHO using the Mahout recommenders is wrong for this. The recommenders are the CF/cooccurrence type that expect usage or rating data on fairly long lived items from a somewhat static catalog. Trying to make them work for content based recommendations is needlessly difficult especially since other tools are custom made for this. Like RowSimilarityJob and Solr. Each find content-based similarity with no rating or CF data needed.

Profile creation is another subject and still does not use a Mahout recommender. You can keep the text of articles the user has rated, read, whatever. These will form the basis of your user profile. For each of them (if their are not too many) you could use them as the query to Solr returning similar docs for each in the profile. You could also lump them all together and use this as the query. You can also experiment with various ways to process profile data. If there are enough articles in the profile you might categorize them with clustering. then use the centroid of the clusters as the Solr query. 

The same thing can be done in batch mode with Mahout’s RowSimilarityJob. Take the user's cluster centroids as synthetic items, add them to the item DRM of news articles you get out of the text pipeline and run RSJ on that. For each synthetic item (cluster centroid) you’ll get a list of articles that are most similar. 

Not sure clustering the user profile is the best idea though since it would require quite a few articles for the user in question. If you have some method of labeling your articles (categories, tags, or the like) you can build classifiers for each. Then see what categories your user reads from the most by classifying the articles in their profile based on the labeled training data. As new articles come in and are classified you can funnel them to the right users. 

You can do this with clustering too but generally clustering is not as good as classifying since it is unsupervised learning. However clustering all news will probably give better results than clustering the user’s profile articles. So you would cluster your news corpus, which will include the articles your user has read, then recommend other articles that the user’s profile articles was clustered with (from the same cluster). This is only slightly different than using the profile articles as Solr queries but may produce better results. However the Solr queries will work even if the query (profile news article) is not in the index and will return results in realtime, requiring no batch RSJ.

BTW I did just this as an experiment. I used my own browsing history as the profile, clustered the pages I read, then took the top terms from the centroids and did Google searches with them. Since the sources are so varied in Google I had to create a custom search engine to include only specific sites. It worked pretty well for discovering related pages.

On Mar 5, 2014, at 8:46 AM, Sebastian Schelter <ss...@apache.org> wrote:

> So both strategies seems to be effectively the same, I don't know what
> the implementers had in mind when designing
> AllSimilarItemsCandidateItemsStrategy.

It can take a long time to estimate preferences for all items a user doesn't know. Especially if you have a lot of items. Traditional item-based recommenders will not recommend any item that is not similar to at least one of the items the user interacted with, so AllSimilarItemsStrategy already selects the maximum set of items that could be potentially recommended to the user.

--sebastian



On 03/05/2014 05:38 PM, Tevfik Aytekin wrote:
> If the similarity between item 5 and two of the items user 1 preferred are not
> NaN then it will return 1, that is what I'm saying. If the
> similarities were all NaN then
> it will not return it.
> 
> But surely, you might wonder if all similarities between an item and
> user's items are NaN, then
> AllUnknownItemsCandidateItemsStrategy probably will not return it.
> 

> On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos <jj...@gmail.com> wrote:
>> @Tevfik, running this recommender:
>> 
>> GenericItemBasedRecommender itemRecommender = new
>> GenericItemBasedRecommender(dataModel, itemSimilarity, new
>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new
>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity));
>> 
>> 
>> With this dataModel:
>> 1,1,1.0
>> 1,2,2.0
>> 1,3,1.0
>> 1,4,2.0
>> 2,1,1.0
>> 2,2,4.0
>> 
>> 
>> And these similarities
>> 1,2,0.1
>> 1,3,0.2
>> 1,4,0.3
>> 2,3,0.5
>> 3,4,0.5
>> 5,1,0.2
>> 5,2,1.0
>> 
>> Returns item 5 for User 1. So item 5 has not been preferred by user 1, and
>> the similarity between item 5 and two of the items user 1 preferred are not
>> NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So,
>> I'm truly sorry to insist on this, but I still really do not get the
>> difference.
>> 
>> 
>> On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin <te...@gmail.com>wrote:
>> 
>>> Juan,
>>> You got me wrong,
>>> 
>>> AllSimilarItemsCandidateItemsStrategy
>>> 
>>> returns all items that have not been rated by the user and the
>>> similarity metric returns a non-NaN similarity value with at
>>> least one of the items preferred by the user.
>>> 
>>> So, it does not simply return all items that have not been rated by
>>> the user. For example, if there is an item X which has not been rated
>>> by the user and if the similarity value between X and at least one of
>>> the items rated (preferred) by the user is not NaN, then X will be not
>>> be returned by AllSimilarItemsCandidateItemsStrategy, but it will be
>>> returned by AllUnknownItemsCandidateItemsStrategy.
>>> 
>>> 
>>> 
>>> On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jj...@gmail.com> wrote:
>>>> Hi Tefik,
>>>> 
>>>> Thanks for the response. I think what you says contradicts what Sebastian
>>>> pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy
>>> returns
>>>> all items that have not been rated by the user, what would
>>>> AllUnknownItemsCandidateItemsStrategy return?
>>>> 
>>>> 
>>>> On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin <tevfik.aytekin@gmail.com
>>>> wrote:
>>>> 
>>>>> Sorry there was a typo in the previous paragraph.
>>>>> 
>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>> 
>>>>> returns all items that have not been rated by the user and the
>>>>> similarity metric returns a non-NaN similarity value with at
>>>>> least one of the items preferred by the user.
>>>>> 
>>>>> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <
>>> tevfik.aytekin@gmail.com>
>>>>> wrote:
>>>>>> Hi Juan,
>>>>>> 
>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>> 
>>>>>> returns all items that have not been rated by the user and the
>>>>>> similarity metric returns a non-NaN similarity value that is with at
>>>>>> least one of the items preferred by the user.
>>>>>> 
>>>>>> Tevfik
>>>>>> 
>>>>>> On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org>
>>>>> wrote:
>>>>>>> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>>>>>>>> 
>>>>>>>> Thanks for the reply, Sebastian.
>>>>>>>> 
>>>>>>>> I am not sure if that should be implemented in the Abstract base
>>> class
>>>>>>>> though because for
>>>>>>>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by
>>>>> definition,
>>>>>>>> it returns the item not rated by the user and rated by somebody
>>> else.
>>>>>>> 
>>>>>>> 
>>>>>>> Good point. So we seem to need special implementations.
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Back to my last post, I have been playing around with
>>>>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>>>> and AllUnknownItemsCandidateItemsStrategy, and although they both do
>>>>> what
>>>>>>>> I
>>>>>>>> wanted (recommend items not previously rated by any user), I
>>> honestly
>>>>>>>> can't
>>>>>>>> tell the difference between the two strategies. In my tests the
>>> output
>>>>> was
>>>>>>>> always the same. If the eventual output of the recommender will not
>>>>>>>> include
>>>>>>>> items already rated by the user as pointed out here (
>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
>>>>> ),
>>>>>>>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
>>>>>>>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
>>>>>>> 
>>>>>>> 
>>>>>>> AllSimilarItems returns all items that are similar to any item that
>>> the
>>>>> user
>>>>>>> already knows. AllUnknownItems simply returns all items that the user
>>>>> has
>>>>>>> not interacted with yet.
>>>>>>> 
>>>>>>> These are two different things, although they might overlap in some
>>>>>>> scenarios.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Sebastian
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ssc@apache.org
>>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hi Juan,
>>>>>>>>> 
>>>>>>>>> that is a good catch. CandidateItemsStrategy is the right place to
>>>>>>>> 
>>>>>>>> implement this. Maybe we should simply extend its interface to add a
>>>>>>>> parameter that says whether to keep or remove the current users
>>> items?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> We could even do this in the abstract base class then.
>>>>>>>>> 
>>>>>>>>> --sebastian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> In case somebody runs into the same situation, the key seems to
>>> be in
>>>>>>>>>> the
>>>>>>>>>> CandidateItemStrategy being passed to the constructor
>>>>>>>>>> of GenericItemBasedRecommender. Looking into the code, if no
>>>>>>>>>> CandidateItemStrategy is specified in the
>>>>>>>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is
>>> used
>>>>>>>>>> and
>>>>>>>>>> as the documentation says, the doGetCandidateItems method:
>>> "returns
>>>>> all
>>>>>>>>>> items that have not been rated by the user and that were
>>> preferred by
>>>>>>>>>> another user that has preferred at least one item that the current
>>>>> user
>>>>>>>> 
>>>>>>>> has
>>>>>>>>>> 
>>>>>>>>>> preferred too".
>>>>>>>>>> 
>>>>>>>>>> So, a different CandidateItemStrategy needs to be passed. For this
>>>>>>>> 
>>>>>>>> problem,
>>>>>>>>>> 
>>>>>>>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>>>>>>>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does
>>>>> anybody
>>>>>>>>>> know where to find some documentation about the different
>>>>>>>>>> CandidateItemStrategy? Based on the name I would say that:
>>>>>>>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
>>>>>>>>>> regardless of whether they have been already rated by someone or
>>> not.
>>>>>>>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items
>>>>> that
>>>>>>>>>> have not been rated by anyone yet.
>>>>>>>>>> 
>>>>>>>>>> Does anybody know if it works like that?
>>>>>>>>>> Thanks.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <
>>> jjarmos@gmail.com>
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> First thing is thatI know this requirement would not make sense
>>> in
>>>>> a CF
>>>>>>>>>>> Recommender. In my case, I am trying to use Mahout to create
>>>>> something
>>>>>>>>>>> closer to a Content-Based Recommender.
>>>>>>>>>>> 
>>>>>>>>>>> In particular, I am pre-computing a similarity matrix between all
>>>>> the
>>>>>>>>>>> documents (items) of my catalogue and using that matrix as the
>>>>>>>>>>> ItemSimilarity for my Item-Based Recommender.
>>>>>>>>>>> 
>>>>>>>>>>> So, when a user rates a document, how could I make the
>>> recommender
>>>>>>>> 
>>>>>>>> outputs
>>>>>>>>>>> 
>>>>>>>>>>> similar documents to that ones the user has already rated even
>>> if no
>>>>>>>> 
>>>>>>>> other
>>>>>>>>>>> 
>>>>>>>>>>> user in the system has rated them yet? Is that even possible in
>>> the
>>>>>>>> 
>>>>>>>> first
>>>>>>>>>>> 
>>>>>>>>>>> place?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks a lot.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 



Re: Recommend items not rated by any user

Posted by Sebastian Schelter <ss...@apache.org>.
 > So both strategies seems to be effectively the same, I don't know what
 > the implementers had in mind when designing
 > AllSimilarItemsCandidateItemsStrategy.

It can take a long time to estimate preferences for all items a user 
doesn't know. Especially if you have a lot of items. Traditional 
item-based recommenders will not recommend any item that is not similar 
to at least one of the items the user interacted with, so 
AllSimilarItemsStrategy already selects the maximum set of items that 
could be potentially recommended to the user.

--sebastian



On 03/05/2014 05:38 PM, Tevfik Aytekin wrote:
> If the similarity between item 5 and two of the items user 1 preferred are not
> NaN then it will return 1, that is what I'm saying. If the
> similarities were all NaN then
> it will not return it.
>
> But surely, you might wonder if all similarities between an item and
> user's items are NaN, then
> AllUnknownItemsCandidateItemsStrategy probably will not return it.
>

> On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos <jj...@gmail.com> wrote:
>> @Tevfik, running this recommender:
>>
>> GenericItemBasedRecommender itemRecommender = new
>> GenericItemBasedRecommender(dataModel, itemSimilarity, new
>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new
>> AllSimilarItemsCandidateItemsStrategy(itemSimilarity));
>>
>>
>> With this dataModel:
>> 1,1,1.0
>> 1,2,2.0
>> 1,3,1.0
>> 1,4,2.0
>> 2,1,1.0
>> 2,2,4.0
>>
>>
>> And these similarities
>> 1,2,0.1
>> 1,3,0.2
>> 1,4,0.3
>> 2,3,0.5
>> 3,4,0.5
>> 5,1,0.2
>> 5,2,1.0
>>
>> Returns item 5 for User 1. So item 5 has not been preferred by user 1, and
>> the similarity between item 5 and two of the items user 1 preferred are not
>> NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So,
>> I'm truly sorry to insist on this, but I still really do not get the
>> difference.
>>
>>
>> On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin <te...@gmail.com>wrote:
>>
>>> Juan,
>>> You got me wrong,
>>>
>>> AllSimilarItemsCandidateItemsStrategy
>>>
>>> returns all items that have not been rated by the user and the
>>> similarity metric returns a non-NaN similarity value with at
>>> least one of the items preferred by the user.
>>>
>>> So, it does not simply return all items that have not been rated by
>>> the user. For example, if there is an item X which has not been rated
>>> by the user and if the similarity value between X and at least one of
>>> the items rated (preferred) by the user is not NaN, then X will be not
>>> be returned by AllSimilarItemsCandidateItemsStrategy, but it will be
>>> returned by AllUnknownItemsCandidateItemsStrategy.
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jj...@gmail.com> wrote:
>>>> Hi Tefik,
>>>>
>>>> Thanks for the response. I think what you says contradicts what Sebastian
>>>> pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy
>>> returns
>>>> all items that have not been rated by the user, what would
>>>> AllUnknownItemsCandidateItemsStrategy return?
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin <tevfik.aytekin@gmail.com
>>>> wrote:
>>>>
>>>>> Sorry there was a typo in the previous paragraph.
>>>>>
>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>
>>>>> returns all items that have not been rated by the user and the
>>>>> similarity metric returns a non-NaN similarity value with at
>>>>> least one of the items preferred by the user.
>>>>>
>>>>> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <
>>> tevfik.aytekin@gmail.com>
>>>>> wrote:
>>>>>> Hi Juan,
>>>>>>
>>>>>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>>>>>
>>>>>> returns all items that have not been rated by the user and the
>>>>>> similarity metric returns a non-NaN similarity value that is with at
>>>>>> least one of the items preferred by the user.
>>>>>>
>>>>>> Tevfik
>>>>>>
>>>>>> On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org>
>>>>> wrote:
>>>>>>> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>>>>>>>>
>>>>>>>> Thanks for the reply, Sebastian.
>>>>>>>>
>>>>>>>> I am not sure if that should be implemented in the Abstract base
>>> class
>>>>>>>> though because for
>>>>>>>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by
>>>>> definition,
>>>>>>>> it returns the item not rated by the user and rated by somebody
>>> else.
>>>>>>>
>>>>>>>
>>>>>>> Good point. So we seem to need special implementations.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Back to my last post, I have been playing around with
>>>>>>>> AllSimilarItemsCandidateItemsStrategy
>>>>>>>> and AllUnknownItemsCandidateItemsStrategy, and although they both do
>>>>> what
>>>>>>>> I
>>>>>>>> wanted (recommend items not previously rated by any user), I
>>> honestly
>>>>>>>> can't
>>>>>>>> tell the difference between the two strategies. In my tests the
>>> output
>>>>> was
>>>>>>>> always the same. If the eventual output of the recommender will not
>>>>>>>> include
>>>>>>>> items already rated by the user as pointed out here (
>>>>>>>>
>>>>>>>>
>>>>>
>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
>>>>> ),
>>>>>>>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
>>>>>>>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
>>>>>>>
>>>>>>>
>>>>>>> AllSimilarItems returns all items that are similar to any item that
>>> the
>>>>> user
>>>>>>> already knows. AllUnknownItems simply returns all items that the user
>>>>> has
>>>>>>> not interacted with yet.
>>>>>>>
>>>>>>> These are two different things, although they might overlap in some
>>>>>>> scenarios.
>>>>>>>
>>>>>>> Best,
>>>>>>> Sebastian
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ssc@apache.org
>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Juan,
>>>>>>>>>
>>>>>>>>> that is a good catch. CandidateItemsStrategy is the right place to
>>>>>>>>
>>>>>>>> implement this. Maybe we should simply extend its interface to add a
>>>>>>>> parameter that says whether to keep or remove the current users
>>> items?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We could even do this in the abstract base class then.
>>>>>>>>>
>>>>>>>>> --sebastian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In case somebody runs into the same situation, the key seems to
>>> be in
>>>>>>>>>> the
>>>>>>>>>> CandidateItemStrategy being passed to the constructor
>>>>>>>>>> of GenericItemBasedRecommender. Looking into the code, if no
>>>>>>>>>> CandidateItemStrategy is specified in the
>>>>>>>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is
>>> used
>>>>>>>>>> and
>>>>>>>>>> as the documentation says, the doGetCandidateItems method:
>>> "returns
>>>>> all
>>>>>>>>>> items that have not been rated by the user and that were
>>> preferred by
>>>>>>>>>> another user that has preferred at least one item that the current
>>>>> user
>>>>>>>>
>>>>>>>> has
>>>>>>>>>>
>>>>>>>>>> preferred too".
>>>>>>>>>>
>>>>>>>>>> So, a different CandidateItemStrategy needs to be passed. For this
>>>>>>>>
>>>>>>>> problem,
>>>>>>>>>>
>>>>>>>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>>>>>>>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does
>>>>> anybody
>>>>>>>>>> know where to find some documentation about the different
>>>>>>>>>> CandidateItemStrategy? Based on the name I would say that:
>>>>>>>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
>>>>>>>>>> regardless of whether they have been already rated by someone or
>>> not.
>>>>>>>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items
>>>>> that
>>>>>>>>>> have not been rated by anyone yet.
>>>>>>>>>>
>>>>>>>>>> Does anybody know if it works like that?
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <
>>> jjarmos@gmail.com>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> First thing is thatI know this requirement would not make sense
>>> in
>>>>> a CF
>>>>>>>>>>> Recommender. In my case, I am trying to use Mahout to create
>>>>> something
>>>>>>>>>>> closer to a Content-Based Recommender.
>>>>>>>>>>>
>>>>>>>>>>> In particular, I am pre-computing a similarity matrix between all
>>>>> the
>>>>>>>>>>> documents (items) of my catalogue and using that matrix as the
>>>>>>>>>>> ItemSimilarity for my Item-Based Recommender.
>>>>>>>>>>>
>>>>>>>>>>> So, when a user rates a document, how could I make the
>>> recommender
>>>>>>>>
>>>>>>>> outputs
>>>>>>>>>>>
>>>>>>>>>>> similar documents to that ones the user has already rated even
>>> if no
>>>>>>>>
>>>>>>>> other
>>>>>>>>>>>
>>>>>>>>>>> user in the system has rated them yet? Is that even possible in
>>> the
>>>>>>>>
>>>>>>>> first
>>>>>>>>>>>
>>>>>>>>>>> place?
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>


Re: Recommend items not rated by any user

Posted by Tevfik Aytekin <te...@gmail.com>.
If the similarity between item 5 and two of the items user 1 preferred are not
NaN then it will return 1, that is what I'm saying. If the
similarities were all NaN then
it will not return it.

But surely, you might wonder if all similarities between an item and
user's items are NaN, then
AllUnknownItemsCandidateItemsStrategy probably will not return it.

So both strategies seems to be effectively the same, I don't know what
the implementers had in mind when designing
AllSimilarItemsCandidateItemsStrategy.

On Wed, Mar 5, 2014 at 6:06 PM, Juan José Ramos <jj...@gmail.com> wrote:
> @Tevfik, running this recommender:
>
> GenericItemBasedRecommender itemRecommender = new
> GenericItemBasedRecommender(dataModel, itemSimilarity, new
> AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new
> AllSimilarItemsCandidateItemsStrategy(itemSimilarity));
>
>
> With this dataModel:
> 1,1,1.0
> 1,2,2.0
> 1,3,1.0
> 1,4,2.0
> 2,1,1.0
> 2,2,4.0
>
>
> And these similarities
> 1,2,0.1
> 1,3,0.2
> 1,4,0.3
> 2,3,0.5
> 3,4,0.5
> 5,1,0.2
> 5,2,1.0
>
> Returns item 5 for User 1. So item 5 has not been preferred by user 1, and
> the similarity between item 5 and two of the items user 1 preferred are not
> NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So,
> I'm truly sorry to insist on this, but I still really do not get the
> difference.
>
>
> On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin <te...@gmail.com>wrote:
>
>> Juan,
>> You got me wrong,
>>
>> AllSimilarItemsCandidateItemsStrategy
>>
>> returns all items that have not been rated by the user and the
>> similarity metric returns a non-NaN similarity value with at
>> least one of the items preferred by the user.
>>
>> So, it does not simply return all items that have not been rated by
>> the user. For example, if there is an item X which has not been rated
>> by the user and if the similarity value between X and at least one of
>> the items rated (preferred) by the user is not NaN, then X will be not
>> be returned by AllSimilarItemsCandidateItemsStrategy, but it will be
>> returned by AllUnknownItemsCandidateItemsStrategy.
>>
>>
>>
>> On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jj...@gmail.com> wrote:
>> > Hi Tefik,
>> >
>> > Thanks for the response. I think what you says contradicts what Sebastian
>> > pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy
>> returns
>> > all items that have not been rated by the user, what would
>> > AllUnknownItemsCandidateItemsStrategy return?
>> >
>> >
>> > On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin <tevfik.aytekin@gmail.com
>> >wrote:
>> >
>> >> Sorry there was a typo in the previous paragraph.
>> >>
>> >> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>> >>
>> >> returns all items that have not been rated by the user and the
>> >> similarity metric returns a non-NaN similarity value with at
>> >> least one of the items preferred by the user.
>> >>
>> >> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <
>> tevfik.aytekin@gmail.com>
>> >> wrote:
>> >> > Hi Juan,
>> >> >
>> >> > If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>> >> >
>> >> > returns all items that have not been rated by the user and the
>> >> > similarity metric returns a non-NaN similarity value that is with at
>> >> > least one of the items preferred by the user.
>> >> >
>> >> > Tevfik
>> >> >
>> >> > On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org>
>> >> wrote:
>> >> >> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>> >> >>>
>> >> >>> Thanks for the reply, Sebastian.
>> >> >>>
>> >> >>> I am not sure if that should be implemented in the Abstract base
>> class
>> >> >>> though because for
>> >> >>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by
>> >> definition,
>> >> >>> it returns the item not rated by the user and rated by somebody
>> else.
>> >> >>
>> >> >>
>> >> >> Good point. So we seem to need special implementations.
>> >> >>
>> >> >>
>> >> >>>
>> >> >>> Back to my last post, I have been playing around with
>> >> >>> AllSimilarItemsCandidateItemsStrategy
>> >> >>> and AllUnknownItemsCandidateItemsStrategy, and although they both do
>> >> what
>> >> >>> I
>> >> >>> wanted (recommend items not previously rated by any user), I
>> honestly
>> >> >>> can't
>> >> >>> tell the difference between the two strategies. In my tests the
>> output
>> >> was
>> >> >>> always the same. If the eventual output of the recommender will not
>> >> >>> include
>> >> >>> items already rated by the user as pointed out here (
>> >> >>>
>> >> >>>
>> >>
>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
>> >> ),
>> >> >>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
>> >> >>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
>> >> >>
>> >> >>
>> >> >> AllSimilarItems returns all items that are similar to any item that
>> the
>> >> user
>> >> >> already knows. AllUnknownItems simply returns all items that the user
>> >> has
>> >> >> not interacted with yet.
>> >> >>
>> >> >> These are two different things, although they might overlap in some
>> >> >> scenarios.
>> >> >>
>> >> >> Best,
>> >> >> Sebastian
>> >> >>
>> >> >>
>> >> >>
>> >> >>>
>> >> >>> Thanks.
>> >> >>>
>> >> >>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ssc@apache.org
>> >
>> >> >>> wrote:
>> >> >>>>
>> >> >>>>
>> >> >>>> Hi Juan,
>> >> >>>>
>> >> >>>> that is a good catch. CandidateItemsStrategy is the right place to
>> >> >>>
>> >> >>> implement this. Maybe we should simply extend its interface to add a
>> >> >>> parameter that says whether to keep or remove the current users
>> items?
>> >> >>>>
>> >> >>>>
>> >> >>>> We could even do this in the abstract base class then.
>> >> >>>>
>> >> >>>> --sebastian
>> >> >>>>
>> >> >>>>
>> >> >>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> In case somebody runs into the same situation, the key seems to
>> be in
>> >> >>>>> the
>> >> >>>>> CandidateItemStrategy being passed to the constructor
>> >> >>>>> of GenericItemBasedRecommender. Looking into the code, if no
>> >> >>>>> CandidateItemStrategy is specified in the
>> >> >>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is
>> used
>> >> >>>>> and
>> >> >>>>> as the documentation says, the doGetCandidateItems method:
>> "returns
>> >> all
>> >> >>>>> items that have not been rated by the user and that were
>> preferred by
>> >> >>>>> another user that has preferred at least one item that the current
>> >> user
>> >> >>>
>> >> >>> has
>> >> >>>>>
>> >> >>>>> preferred too".
>> >> >>>>>
>> >> >>>>> So, a different CandidateItemStrategy needs to be passed. For this
>> >> >>>
>> >> >>> problem,
>> >> >>>>>
>> >> >>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>> >> >>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does
>> >> anybody
>> >> >>>>> know where to find some documentation about the different
>> >> >>>>> CandidateItemStrategy? Based on the name I would say that:
>> >> >>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
>> >> >>>>> regardless of whether they have been already rated by someone or
>> not.
>> >> >>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items
>> >> that
>> >> >>>>> have not been rated by anyone yet.
>> >> >>>>>
>> >> >>>>> Does anybody know if it works like that?
>> >> >>>>> Thanks.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <
>> jjarmos@gmail.com>
>> >> >>>
>> >> >>> wrote:
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>> First thing is thatI know this requirement would not make sense
>> in
>> >> a CF
>> >> >>>>>> Recommender. In my case, I am trying to use Mahout to create
>> >> something
>> >> >>>>>> closer to a Content-Based Recommender.
>> >> >>>>>>
>> >> >>>>>> In particular, I am pre-computing a similarity matrix between all
>> >> the
>> >> >>>>>> documents (items) of my catalogue and using that matrix as the
>> >> >>>>>> ItemSimilarity for my Item-Based Recommender.
>> >> >>>>>>
>> >> >>>>>> So, when a user rates a document, how could I make the
>> recommender
>> >> >>>
>> >> >>> outputs
>> >> >>>>>>
>> >> >>>>>> similar documents to that ones the user has already rated even
>> if no
>> >> >>>
>> >> >>> other
>> >> >>>>>>
>> >> >>>>>> user in the system has rated them yet? Is that even possible in
>> the
>> >> >>>
>> >> >>> first
>> >> >>>>>>
>> >> >>>>>> place?
>> >> >>>>>>
>> >> >>>>>> Thanks a lot.
>> >> >>>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >>
>>

Re: Recommend items not rated by any user

Posted by Juan José Ramos <jj...@gmail.com>.
@Tevfik, running this recommender:

GenericItemBasedRecommender itemRecommender = new
GenericItemBasedRecommender(dataModel, itemSimilarity, new
AllSimilarItemsCandidateItemsStrategy(itemSimilarity), new
AllSimilarItemsCandidateItemsStrategy(itemSimilarity));


With this dataModel:
1,1,1.0
1,2,2.0
1,3,1.0
1,4,2.0
2,1,1.0
2,2,4.0


And these similarities
1,2,0.1
1,3,0.2
1,4,0.3
2,3,0.5
3,4,0.5
5,1,0.2
5,2,1.0

Returns item 5 for User 1. So item 5 has not been preferred by user 1, and
the similarity between item 5 and two of the items user 1 preferred are not
NaN, but AllSimilarItemsCandidateItemsStrategy is returning that item. So,
I'm truly sorry to insist on this, but I still really do not get the
difference.


On Wed, Mar 5, 2014 at 2:53 PM, Tevfik Aytekin <te...@gmail.com>wrote:

> Juan,
> You got me wrong,
>
> AllSimilarItemsCandidateItemsStrategy
>
> returns all items that have not been rated by the user and the
> similarity metric returns a non-NaN similarity value with at
> least one of the items preferred by the user.
>
> So, it does not simply return all items that have not been rated by
> the user. For example, if there is an item X which has not been rated
> by the user and if the similarity value between X and at least one of
> the items rated (preferred) by the user is not NaN, then X will be not
> be returned by AllSimilarItemsCandidateItemsStrategy, but it will be
> returned by AllUnknownItemsCandidateItemsStrategy.
>
>
>
> On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jj...@gmail.com> wrote:
> > Hi Tefik,
> >
> > Thanks for the response. I think what you says contradicts what Sebastian
> > pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy
> returns
> > all items that have not been rated by the user, what would
> > AllUnknownItemsCandidateItemsStrategy return?
> >
> >
> > On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin <tevfik.aytekin@gmail.com
> >wrote:
> >
> >> Sorry there was a typo in the previous paragraph.
> >>
> >> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
> >>
> >> returns all items that have not been rated by the user and the
> >> similarity metric returns a non-NaN similarity value with at
> >> least one of the items preferred by the user.
> >>
> >> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <
> tevfik.aytekin@gmail.com>
> >> wrote:
> >> > Hi Juan,
> >> >
> >> > If I remember correctly, AllSimilarItemsCandidateItemsStrategy
> >> >
> >> > returns all items that have not been rated by the user and the
> >> > similarity metric returns a non-NaN similarity value that is with at
> >> > least one of the items preferred by the user.
> >> >
> >> > Tevfik
> >> >
> >> > On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org>
> >> wrote:
> >> >> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
> >> >>>
> >> >>> Thanks for the reply, Sebastian.
> >> >>>
> >> >>> I am not sure if that should be implemented in the Abstract base
> class
> >> >>> though because for
> >> >>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by
> >> definition,
> >> >>> it returns the item not rated by the user and rated by somebody
> else.
> >> >>
> >> >>
> >> >> Good point. So we seem to need special implementations.
> >> >>
> >> >>
> >> >>>
> >> >>> Back to my last post, I have been playing around with
> >> >>> AllSimilarItemsCandidateItemsStrategy
> >> >>> and AllUnknownItemsCandidateItemsStrategy, and although they both do
> >> what
> >> >>> I
> >> >>> wanted (recommend items not previously rated by any user), I
> honestly
> >> >>> can't
> >> >>> tell the difference between the two strategies. In my tests the
> output
> >> was
> >> >>> always the same. If the eventual output of the recommender will not
> >> >>> include
> >> >>> items already rated by the user as pointed out here (
> >> >>>
> >> >>>
> >>
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
> >> ),
> >> >>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
> >> >>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
> >> >>
> >> >>
> >> >> AllSimilarItems returns all items that are similar to any item that
> the
> >> user
> >> >> already knows. AllUnknownItems simply returns all items that the user
> >> has
> >> >> not interacted with yet.
> >> >>
> >> >> These are two different things, although they might overlap in some
> >> >> scenarios.
> >> >>
> >> >> Best,
> >> >> Sebastian
> >> >>
> >> >>
> >> >>
> >> >>>
> >> >>> Thanks.
> >> >>>
> >> >>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ssc@apache.org
> >
> >> >>> wrote:
> >> >>>>
> >> >>>>
> >> >>>> Hi Juan,
> >> >>>>
> >> >>>> that is a good catch. CandidateItemsStrategy is the right place to
> >> >>>
> >> >>> implement this. Maybe we should simply extend its interface to add a
> >> >>> parameter that says whether to keep or remove the current users
> items?
> >> >>>>
> >> >>>>
> >> >>>> We could even do this in the abstract base class then.
> >> >>>>
> >> >>>> --sebastian
> >> >>>>
> >> >>>>
> >> >>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
> >> >>>>>
> >> >>>>>
> >> >>>>> In case somebody runs into the same situation, the key seems to
> be in
> >> >>>>> the
> >> >>>>> CandidateItemStrategy being passed to the constructor
> >> >>>>> of GenericItemBasedRecommender. Looking into the code, if no
> >> >>>>> CandidateItemStrategy is specified in the
> >> >>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is
> used
> >> >>>>> and
> >> >>>>> as the documentation says, the doGetCandidateItems method:
> "returns
> >> all
> >> >>>>> items that have not been rated by the user and that were
> preferred by
> >> >>>>> another user that has preferred at least one item that the current
> >> user
> >> >>>
> >> >>> has
> >> >>>>>
> >> >>>>> preferred too".
> >> >>>>>
> >> >>>>> So, a different CandidateItemStrategy needs to be passed. For this
> >> >>>
> >> >>> problem,
> >> >>>>>
> >> >>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
> >> >>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does
> >> anybody
> >> >>>>> know where to find some documentation about the different
> >> >>>>> CandidateItemStrategy? Based on the name I would say that:
> >> >>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
> >> >>>>> regardless of whether they have been already rated by someone or
> not.
> >> >>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items
> >> that
> >> >>>>> have not been rated by anyone yet.
> >> >>>>>
> >> >>>>> Does anybody know if it works like that?
> >> >>>>> Thanks.
> >> >>>>>
> >> >>>>>
> >> >>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <
> jjarmos@gmail.com>
> >> >>>
> >> >>> wrote:
> >> >>>>>
> >> >>>>>
> >> >>>>>> First thing is thatI know this requirement would not make sense
> in
> >> a CF
> >> >>>>>> Recommender. In my case, I am trying to use Mahout to create
> >> something
> >> >>>>>> closer to a Content-Based Recommender.
> >> >>>>>>
> >> >>>>>> In particular, I am pre-computing a similarity matrix between all
> >> the
> >> >>>>>> documents (items) of my catalogue and using that matrix as the
> >> >>>>>> ItemSimilarity for my Item-Based Recommender.
> >> >>>>>>
> >> >>>>>> So, when a user rates a document, how could I make the
> recommender
> >> >>>
> >> >>> outputs
> >> >>>>>>
> >> >>>>>> similar documents to that ones the user has already rated even
> if no
> >> >>>
> >> >>> other
> >> >>>>>>
> >> >>>>>> user in the system has rated them yet? Is that even possible in
> the
> >> >>>
> >> >>> first
> >> >>>>>>
> >> >>>>>> place?
> >> >>>>>>
> >> >>>>>> Thanks a lot.
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
>

Re: Recommend items not rated by any user

Posted by Tevfik Aytekin <te...@gmail.com>.
Juan,
You got me wrong,

AllSimilarItemsCandidateItemsStrategy

returns all items that have not been rated by the user and the
similarity metric returns a non-NaN similarity value with at
least one of the items preferred by the user.

So, it does not simply return all items that have not been rated by
the user. For example, if there is an item X which has not been rated
by the user and if the similarity value between X and at least one of
the items rated (preferred) by the user is not NaN, then X will be not
be returned by AllSimilarItemsCandidateItemsStrategy, but it will be
returned by AllUnknownItemsCandidateItemsStrategy.



On Wed, Mar 5, 2014 at 4:42 PM, Juan José Ramos <jj...@gmail.com> wrote:
> Hi Tefik,
>
> Thanks for the response. I think what you says contradicts what Sebastian
> pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns
> all items that have not been rated by the user, what would
> AllUnknownItemsCandidateItemsStrategy return?
>
>
> On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin <te...@gmail.com>wrote:
>
>> Sorry there was a typo in the previous paragraph.
>>
>> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>>
>> returns all items that have not been rated by the user and the
>> similarity metric returns a non-NaN similarity value with at
>> least one of the items preferred by the user.
>>
>> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <te...@gmail.com>
>> wrote:
>> > Hi Juan,
>> >
>> > If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>> >
>> > returns all items that have not been rated by the user and the
>> > similarity metric returns a non-NaN similarity value that is with at
>> > least one of the items preferred by the user.
>> >
>> > Tevfik
>> >
>> > On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org>
>> wrote:
>> >> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>> >>>
>> >>> Thanks for the reply, Sebastian.
>> >>>
>> >>> I am not sure if that should be implemented in the Abstract base class
>> >>> though because for
>> >>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by
>> definition,
>> >>> it returns the item not rated by the user and rated by somebody else.
>> >>
>> >>
>> >> Good point. So we seem to need special implementations.
>> >>
>> >>
>> >>>
>> >>> Back to my last post, I have been playing around with
>> >>> AllSimilarItemsCandidateItemsStrategy
>> >>> and AllUnknownItemsCandidateItemsStrategy, and although they both do
>> what
>> >>> I
>> >>> wanted (recommend items not previously rated by any user), I honestly
>> >>> can't
>> >>> tell the difference between the two strategies. In my tests the output
>> was
>> >>> always the same. If the eventual output of the recommender will not
>> >>> include
>> >>> items already rated by the user as pointed out here (
>> >>>
>> >>>
>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
>> ),
>> >>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
>> >>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
>> >>
>> >>
>> >> AllSimilarItems returns all items that are similar to any item that the
>> user
>> >> already knows. AllUnknownItems simply returns all items that the user
>> has
>> >> not interacted with yet.
>> >>
>> >> These are two different things, although they might overlap in some
>> >> scenarios.
>> >>
>> >> Best,
>> >> Sebastian
>> >>
>> >>
>> >>
>> >>>
>> >>> Thanks.
>> >>>
>> >>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ss...@apache.org>
>> >>> wrote:
>> >>>>
>> >>>>
>> >>>> Hi Juan,
>> >>>>
>> >>>> that is a good catch. CandidateItemsStrategy is the right place to
>> >>>
>> >>> implement this. Maybe we should simply extend its interface to add a
>> >>> parameter that says whether to keep or remove the current users items?
>> >>>>
>> >>>>
>> >>>> We could even do this in the abstract base class then.
>> >>>>
>> >>>> --sebastian
>> >>>>
>> >>>>
>> >>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>> >>>>>
>> >>>>>
>> >>>>> In case somebody runs into the same situation, the key seems to be in
>> >>>>> the
>> >>>>> CandidateItemStrategy being passed to the constructor
>> >>>>> of GenericItemBasedRecommender. Looking into the code, if no
>> >>>>> CandidateItemStrategy is specified in the
>> >>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used
>> >>>>> and
>> >>>>> as the documentation says, the doGetCandidateItems method: "returns
>> all
>> >>>>> items that have not been rated by the user and that were preferred by
>> >>>>> another user that has preferred at least one item that the current
>> user
>> >>>
>> >>> has
>> >>>>>
>> >>>>> preferred too".
>> >>>>>
>> >>>>> So, a different CandidateItemStrategy needs to be passed. For this
>> >>>
>> >>> problem,
>> >>>>>
>> >>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>> >>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does
>> anybody
>> >>>>> know where to find some documentation about the different
>> >>>>> CandidateItemStrategy? Based on the name I would say that:
>> >>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
>> >>>>> regardless of whether they have been already rated by someone or not.
>> >>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items
>> that
>> >>>>> have not been rated by anyone yet.
>> >>>>>
>> >>>>> Does anybody know if it works like that?
>> >>>>> Thanks.
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <jj...@gmail.com>
>> >>>
>> >>> wrote:
>> >>>>>
>> >>>>>
>> >>>>>> First thing is thatI know this requirement would not make sense in
>> a CF
>> >>>>>> Recommender. In my case, I am trying to use Mahout to create
>> something
>> >>>>>> closer to a Content-Based Recommender.
>> >>>>>>
>> >>>>>> In particular, I am pre-computing a similarity matrix between all
>> the
>> >>>>>> documents (items) of my catalogue and using that matrix as the
>> >>>>>> ItemSimilarity for my Item-Based Recommender.
>> >>>>>>
>> >>>>>> So, when a user rates a document, how could I make the recommender
>> >>>
>> >>> outputs
>> >>>>>>
>> >>>>>> similar documents to that ones the user has already rated even if no
>> >>>
>> >>> other
>> >>>>>>
>> >>>>>> user in the system has rated them yet? Is that even possible in the
>> >>>
>> >>> first
>> >>>>>>
>> >>>>>> place?
>> >>>>>>
>> >>>>>> Thanks a lot.
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>

Re: Recommend items not rated by any user

Posted by Juan José Ramos <jj...@gmail.com>.
Hi Tefik,

Thanks for the response. I think what you says contradicts what Sebastian
pointed out before. Also, if AllSimilarItemsCandidateItemsStrategy returns
all items that have not been rated by the user, what would
AllUnknownItemsCandidateItemsStrategy return?


On Wed, Mar 5, 2014 at 1:40 PM, Tevfik Aytekin <te...@gmail.com>wrote:

> Sorry there was a typo in the previous paragraph.
>
> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>
> returns all items that have not been rated by the user and the
> similarity metric returns a non-NaN similarity value with at
> least one of the items preferred by the user.
>
> On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <te...@gmail.com>
> wrote:
> > Hi Juan,
> >
> > If I remember correctly, AllSimilarItemsCandidateItemsStrategy
> >
> > returns all items that have not been rated by the user and the
> > similarity metric returns a non-NaN similarity value that is with at
> > least one of the items preferred by the user.
> >
> > Tevfik
> >
> > On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org>
> wrote:
> >> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
> >>>
> >>> Thanks for the reply, Sebastian.
> >>>
> >>> I am not sure if that should be implemented in the Abstract base class
> >>> though because for
> >>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by
> definition,
> >>> it returns the item not rated by the user and rated by somebody else.
> >>
> >>
> >> Good point. So we seem to need special implementations.
> >>
> >>
> >>>
> >>> Back to my last post, I have been playing around with
> >>> AllSimilarItemsCandidateItemsStrategy
> >>> and AllUnknownItemsCandidateItemsStrategy, and although they both do
> what
> >>> I
> >>> wanted (recommend items not previously rated by any user), I honestly
> >>> can't
> >>> tell the difference between the two strategies. In my tests the output
> was
> >>> always the same. If the eventual output of the recommender will not
> >>> include
> >>> items already rated by the user as pointed out here (
> >>>
> >>>
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E
> ),
> >>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
> >>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
> >>
> >>
> >> AllSimilarItems returns all items that are similar to any item that the
> user
> >> already knows. AllUnknownItems simply returns all items that the user
> has
> >> not interacted with yet.
> >>
> >> These are two different things, although they might overlap in some
> >> scenarios.
> >>
> >> Best,
> >> Sebastian
> >>
> >>
> >>
> >>>
> >>> Thanks.
> >>>
> >>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ss...@apache.org>
> >>> wrote:
> >>>>
> >>>>
> >>>> Hi Juan,
> >>>>
> >>>> that is a good catch. CandidateItemsStrategy is the right place to
> >>>
> >>> implement this. Maybe we should simply extend its interface to add a
> >>> parameter that says whether to keep or remove the current users items?
> >>>>
> >>>>
> >>>> We could even do this in the abstract base class then.
> >>>>
> >>>> --sebastian
> >>>>
> >>>>
> >>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
> >>>>>
> >>>>>
> >>>>> In case somebody runs into the same situation, the key seems to be in
> >>>>> the
> >>>>> CandidateItemStrategy being passed to the constructor
> >>>>> of GenericItemBasedRecommender. Looking into the code, if no
> >>>>> CandidateItemStrategy is specified in the
> >>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used
> >>>>> and
> >>>>> as the documentation says, the doGetCandidateItems method: "returns
> all
> >>>>> items that have not been rated by the user and that were preferred by
> >>>>> another user that has preferred at least one item that the current
> user
> >>>
> >>> has
> >>>>>
> >>>>> preferred too".
> >>>>>
> >>>>> So, a different CandidateItemStrategy needs to be passed. For this
> >>>
> >>> problem,
> >>>>>
> >>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
> >>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does
> anybody
> >>>>> know where to find some documentation about the different
> >>>>> CandidateItemStrategy? Based on the name I would say that:
> >>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
> >>>>> regardless of whether they have been already rated by someone or not.
> >>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items
> that
> >>>>> have not been rated by anyone yet.
> >>>>>
> >>>>> Does anybody know if it works like that?
> >>>>> Thanks.
> >>>>>
> >>>>>
> >>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <jj...@gmail.com>
> >>>
> >>> wrote:
> >>>>>
> >>>>>
> >>>>>> First thing is thatI know this requirement would not make sense in
> a CF
> >>>>>> Recommender. In my case, I am trying to use Mahout to create
> something
> >>>>>> closer to a Content-Based Recommender.
> >>>>>>
> >>>>>> In particular, I am pre-computing a similarity matrix between all
> the
> >>>>>> documents (items) of my catalogue and using that matrix as the
> >>>>>> ItemSimilarity for my Item-Based Recommender.
> >>>>>>
> >>>>>> So, when a user rates a document, how could I make the recommender
> >>>
> >>> outputs
> >>>>>>
> >>>>>> similar documents to that ones the user has already rated even if no
> >>>
> >>> other
> >>>>>>
> >>>>>> user in the system has rated them yet? Is that even possible in the
> >>>
> >>> first
> >>>>>>
> >>>>>> place?
> >>>>>>
> >>>>>> Thanks a lot.
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: Recommend items not rated by any user

Posted by Tevfik Aytekin <te...@gmail.com>.
Sorry there was a typo in the previous paragraph.

If I remember correctly, AllSimilarItemsCandidateItemsStrategy

returns all items that have not been rated by the user and the
similarity metric returns a non-NaN similarity value with at
least one of the items preferred by the user.

On Wed, Mar 5, 2014 at 3:38 PM, Tevfik Aytekin <te...@gmail.com> wrote:
> Hi Juan,
>
> If I remember correctly, AllSimilarItemsCandidateItemsStrategy
>
> returns all items that have not been rated by the user and the
> similarity metric returns a non-NaN similarity value that is with at
> least one of the items preferred by the user.
>
> Tevfik
>
> On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org> wrote:
>> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>>>
>>> Thanks for the reply, Sebastian.
>>>
>>> I am not sure if that should be implemented in the Abstract base class
>>> though because for
>>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition,
>>> it returns the item not rated by the user and rated by somebody else.
>>
>>
>> Good point. So we seem to need special implementations.
>>
>>
>>>
>>> Back to my last post, I have been playing around with
>>> AllSimilarItemsCandidateItemsStrategy
>>> and AllUnknownItemsCandidateItemsStrategy, and although they both do what
>>> I
>>> wanted (recommend items not previously rated by any user), I honestly
>>> can't
>>> tell the difference between the two strategies. In my tests the output was
>>> always the same. If the eventual output of the recommender will not
>>> include
>>> items already rated by the user as pointed out here (
>>>
>>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E),
>>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
>>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
>>
>>
>> AllSimilarItems returns all items that are similar to any item that the user
>> already knows. AllUnknownItems simply returns all items that the user has
>> not interacted with yet.
>>
>> These are two different things, although they might overlap in some
>> scenarios.
>>
>> Best,
>> Sebastian
>>
>>
>>
>>>
>>> Thanks.
>>>
>>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ss...@apache.org>
>>> wrote:
>>>>
>>>>
>>>> Hi Juan,
>>>>
>>>> that is a good catch. CandidateItemsStrategy is the right place to
>>>
>>> implement this. Maybe we should simply extend its interface to add a
>>> parameter that says whether to keep or remove the current users items?
>>>>
>>>>
>>>> We could even do this in the abstract base class then.
>>>>
>>>> --sebastian
>>>>
>>>>
>>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>>>>>
>>>>>
>>>>> In case somebody runs into the same situation, the key seems to be in
>>>>> the
>>>>> CandidateItemStrategy being passed to the constructor
>>>>> of GenericItemBasedRecommender. Looking into the code, if no
>>>>> CandidateItemStrategy is specified in the
>>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used
>>>>> and
>>>>> as the documentation says, the doGetCandidateItems method: "returns all
>>>>> items that have not been rated by the user and that were preferred by
>>>>> another user that has preferred at least one item that the current user
>>>
>>> has
>>>>>
>>>>> preferred too".
>>>>>
>>>>> So, a different CandidateItemStrategy needs to be passed. For this
>>>
>>> problem,
>>>>>
>>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody
>>>>> know where to find some documentation about the different
>>>>> CandidateItemStrategy? Based on the name I would say that:
>>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
>>>>> regardless of whether they have been already rated by someone or not.
>>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that
>>>>> have not been rated by anyone yet.
>>>>>
>>>>> Does anybody know if it works like that?
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <jj...@gmail.com>
>>>
>>> wrote:
>>>>>
>>>>>
>>>>>> First thing is thatI know this requirement would not make sense in a CF
>>>>>> Recommender. In my case, I am trying to use Mahout to create something
>>>>>> closer to a Content-Based Recommender.
>>>>>>
>>>>>> In particular, I am pre-computing a similarity matrix between all the
>>>>>> documents (items) of my catalogue and using that matrix as the
>>>>>> ItemSimilarity for my Item-Based Recommender.
>>>>>>
>>>>>> So, when a user rates a document, how could I make the recommender
>>>
>>> outputs
>>>>>>
>>>>>> similar documents to that ones the user has already rated even if no
>>>
>>> other
>>>>>>
>>>>>> user in the system has rated them yet? Is that even possible in the
>>>
>>> first
>>>>>>
>>>>>> place?
>>>>>>
>>>>>> Thanks a lot.
>>>>>>
>>>>>
>>>>
>>>
>>

Re: Recommend items not rated by any user

Posted by Tevfik Aytekin <te...@gmail.com>.
Hi Juan,

If I remember correctly, AllSimilarItemsCandidateItemsStrategy

returns all items that have not been rated by the user and the
similarity metric returns a non-NaN similarity value that is with at
least one of the items preferred by the user.

Tevfik

On Wed, Mar 5, 2014 at 2:30 PM, Sebastian Schelter <ss...@apache.org> wrote:
> On 03/05/2014 01:23 PM, Juan José Ramos wrote:
>>
>> Thanks for the reply, Sebastian.
>>
>> I am not sure if that should be implemented in the Abstract base class
>> though because for
>> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition,
>> it returns the item not rated by the user and rated by somebody else.
>
>
> Good point. So we seem to need special implementations.
>
>
>>
>> Back to my last post, I have been playing around with
>> AllSimilarItemsCandidateItemsStrategy
>> and AllUnknownItemsCandidateItemsStrategy, and although they both do what
>> I
>> wanted (recommend items not previously rated by any user), I honestly
>> can't
>> tell the difference between the two strategies. In my tests the output was
>> always the same. If the eventual output of the recommender will not
>> include
>> items already rated by the user as pointed out here (
>>
>> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E),
>> AllSimilarItemsCandidateItemsStrategy should be equivalent to
>> AllUnkownItemsCandidateItemsStrategy, shouldn't it?
>
>
> AllSimilarItems returns all items that are similar to any item that the user
> already knows. AllUnknownItems simply returns all items that the user has
> not interacted with yet.
>
> These are two different things, although they might overlap in some
> scenarios.
>
> Best,
> Sebastian
>
>
>
>>
>> Thanks.
>>
>> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ss...@apache.org>
>> wrote:
>>>
>>>
>>> Hi Juan,
>>>
>>> that is a good catch. CandidateItemsStrategy is the right place to
>>
>> implement this. Maybe we should simply extend its interface to add a
>> parameter that says whether to keep or remove the current users items?
>>>
>>>
>>> We could even do this in the abstract base class then.
>>>
>>> --sebastian
>>>
>>>
>>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>>>>
>>>>
>>>> In case somebody runs into the same situation, the key seems to be in
>>>> the
>>>> CandidateItemStrategy being passed to the constructor
>>>> of GenericItemBasedRecommender. Looking into the code, if no
>>>> CandidateItemStrategy is specified in the
>>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used
>>>> and
>>>> as the documentation says, the doGetCandidateItems method: "returns all
>>>> items that have not been rated by the user and that were preferred by
>>>> another user that has preferred at least one item that the current user
>>
>> has
>>>>
>>>> preferred too".
>>>>
>>>> So, a different CandidateItemStrategy needs to be passed. For this
>>
>> problem,
>>>>
>>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody
>>>> know where to find some documentation about the different
>>>> CandidateItemStrategy? Based on the name I would say that:
>>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
>>>> regardless of whether they have been already rated by someone or not.
>>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that
>>>> have not been rated by anyone yet.
>>>>
>>>> Does anybody know if it works like that?
>>>> Thanks.
>>>>
>>>>
>>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <jj...@gmail.com>
>>
>> wrote:
>>>>
>>>>
>>>>> First thing is thatI know this requirement would not make sense in a CF
>>>>> Recommender. In my case, I am trying to use Mahout to create something
>>>>> closer to a Content-Based Recommender.
>>>>>
>>>>> In particular, I am pre-computing a similarity matrix between all the
>>>>> documents (items) of my catalogue and using that matrix as the
>>>>> ItemSimilarity for my Item-Based Recommender.
>>>>>
>>>>> So, when a user rates a document, how could I make the recommender
>>
>> outputs
>>>>>
>>>>> similar documents to that ones the user has already rated even if no
>>
>> other
>>>>>
>>>>> user in the system has rated them yet? Is that even possible in the
>>
>> first
>>>>>
>>>>> place?
>>>>>
>>>>> Thanks a lot.
>>>>>
>>>>
>>>
>>
>

Re: Recommend items not rated by any user

Posted by Sebastian Schelter <ss...@apache.org>.
On 03/05/2014 01:23 PM, Juan José Ramos wrote:
> Thanks for the reply, Sebastian.
>
> I am not sure if that should be implemented in the Abstract base class
> though because for
> instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition,
> it returns the item not rated by the user and rated by somebody else.

Good point. So we seem to need special implementations.

>
> Back to my last post, I have been playing around with
> AllSimilarItemsCandidateItemsStrategy
> and AllUnknownItemsCandidateItemsStrategy, and although they both do what I
> wanted (recommend items not previously rated by any user), I honestly can't
> tell the difference between the two strategies. In my tests the output was
> always the same. If the eventual output of the recommender will not include
> items already rated by the user as pointed out here (
> http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E),
> AllSimilarItemsCandidateItemsStrategy should be equivalent to
> AllUnkownItemsCandidateItemsStrategy, shouldn't it?

AllSimilarItems returns all items that are similar to any item that the 
user already knows. AllUnknownItems simply returns all items that the 
user has not interacted with yet.

These are two different things, although they might overlap in some 
scenarios.

Best,
Sebastian


>
> Thanks.
>
> On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ss...@apache.org> wrote:
>>
>> Hi Juan,
>>
>> that is a good catch. CandidateItemsStrategy is the right place to
> implement this. Maybe we should simply extend its interface to add a
> parameter that says whether to keep or remove the current users items?
>>
>> We could even do this in the abstract base class then.
>>
>> --sebastian
>>
>>
>> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>>>
>>> In case somebody runs into the same situation, the key seems to be in the
>>> CandidateItemStrategy being passed to the constructor
>>> of GenericItemBasedRecommender. Looking into the code, if no
>>> CandidateItemStrategy is specified in the
>>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and
>>> as the documentation says, the doGetCandidateItems method: "returns all
>>> items that have not been rated by the user and that were preferred by
>>> another user that has preferred at least one item that the current user
> has
>>> preferred too".
>>>
>>> So, a different CandidateItemStrategy needs to be passed. For this
> problem,
>>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody
>>> know where to find some documentation about the different
>>> CandidateItemStrategy? Based on the name I would say that:
>>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
>>> regardless of whether they have been already rated by someone or not.
>>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that
>>> have not been rated by anyone yet.
>>>
>>> Does anybody know if it works like that?
>>> Thanks.
>>>
>>>
>>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <jj...@gmail.com>
> wrote:
>>>
>>>> First thing is thatI know this requirement would not make sense in a CF
>>>> Recommender. In my case, I am trying to use Mahout to create something
>>>> closer to a Content-Based Recommender.
>>>>
>>>> In particular, I am pre-computing a similarity matrix between all the
>>>> documents (items) of my catalogue and using that matrix as the
>>>> ItemSimilarity for my Item-Based Recommender.
>>>>
>>>> So, when a user rates a document, how could I make the recommender
> outputs
>>>> similar documents to that ones the user has already rated even if no
> other
>>>> user in the system has rated them yet? Is that even possible in the
> first
>>>> place?
>>>>
>>>> Thanks a lot.
>>>>
>>>
>>
>


Re: Recommend items not rated by any user

Posted by Juan José Ramos <jj...@gmail.com>.
Thanks for the reply, Sebastian.

I am not sure if that should be implemented in the Abstract base class
though because for
instance PreferredItemsNeighborhoodCandidateItemsStrategy, by definition,
it returns the item not rated by the user and rated by somebody else.

Back to my last post, I have been playing around with
AllSimilarItemsCandidateItemsStrategy
and AllUnknownItemsCandidateItemsStrategy, and although they both do what I
wanted (recommend items not previously rated by any user), I honestly can't
tell the difference between the two strategies. In my tests the output was
always the same. If the eventual output of the recommender will not include
items already rated by the user as pointed out here (
http://mail-archives.apache.org/mod_mbox/mahout-user/201403.mbox/%3CCABHkCkuv35dbwF%2B9sK88FR3hg7MAcdv0MP10v-5QWEvwmNdY%2BA%40mail.gmail.com%3E),
AllSimilarItemsCandidateItemsStrategy should be equivalent to
AllUnkownItemsCandidateItemsStrategy, shouldn't it?

Thanks.

On Wed, Mar 5, 2014 at 10:23 AM, Sebastian Schelter <ss...@apache.org> wrote:
>
> Hi Juan,
>
> that is a good catch. CandidateItemsStrategy is the right place to
implement this. Maybe we should simply extend its interface to add a
parameter that says whether to keep or remove the current users items?
>
> We could even do this in the abstract base class then.
>
> --sebastian
>
>
> On 03/05/2014 10:42 AM, Juan José Ramos wrote:
>>
>> In case somebody runs into the same situation, the key seems to be in the
>> CandidateItemStrategy being passed to the constructor
>> of GenericItemBasedRecommender. Looking into the code, if no
>> CandidateItemStrategy is specified in the
>> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and
>> as the documentation says, the doGetCandidateItems method: "returns all
>> items that have not been rated by the user and that were preferred by
>> another user that has preferred at least one item that the current user
has
>> preferred too".
>>
>> So, a different CandidateItemStrategy needs to be passed. For this
problem,
>> it seems to me that AllSimilarItemsCandidateItemsStrategy,
>> AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody
>> know where to find some documentation about the different
>> CandidateItemStrategy? Based on the name I would say that:
>> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
>> regardless of whether they have been already rated by someone or not.
>> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that
>> have not been rated by anyone yet.
>>
>> Does anybody know if it works like that?
>> Thanks.
>>
>>
>> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <jj...@gmail.com>
wrote:
>>
>>> First thing is thatI know this requirement would not make sense in a CF
>>> Recommender. In my case, I am trying to use Mahout to create something
>>> closer to a Content-Based Recommender.
>>>
>>> In particular, I am pre-computing a similarity matrix between all the
>>> documents (items) of my catalogue and using that matrix as the
>>> ItemSimilarity for my Item-Based Recommender.
>>>
>>> So, when a user rates a document, how could I make the recommender
outputs
>>> similar documents to that ones the user has already rated even if no
other
>>> user in the system has rated them yet? Is that even possible in the
first
>>> place?
>>>
>>> Thanks a lot.
>>>
>>
>

Re: Recommend items not rated by any user

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Juan,

that is a good catch. CandidateItemsStrategy is the right place to 
implement this. Maybe we should simply extend its interface to add a 
parameter that says whether to keep or remove the current users items?

We could even do this in the abstract base class then.

--sebastian

On 03/05/2014 10:42 AM, Juan José Ramos wrote:
> In case somebody runs into the same situation, the key seems to be in the
> CandidateItemStrategy being passed to the constructor
> of GenericItemBasedRecommender. Looking into the code, if no
> CandidateItemStrategy is specified in the
> constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and
> as the documentation says, the doGetCandidateItems method: "returns all
> items that have not been rated by the user and that were preferred by
> another user that has preferred at least one item that the current user has
> preferred too".
>
> So, a different CandidateItemStrategy needs to be passed. For this problem,
> it seems to me that AllSimilarItemsCandidateItemsStrategy,
> AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody
> know where to find some documentation about the different
> CandidateItemStrategy? Based on the name I would say that:
> 1) AllSimilarItemsCandidateItemsStrategy returns all similar items
> regardless of whether they have been already rated by someone or not.
> 2) AllUnknownItemsCandidateItemsStrategy returns all similar items that
> have not been rated by anyone yet.
>
> Does anybody know if it works like that?
> Thanks.
>
>
> On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <jj...@gmail.com> wrote:
>
>> First thing is thatI know this requirement would not make sense in a CF
>> Recommender. In my case, I am trying to use Mahout to create something
>> closer to a Content-Based Recommender.
>>
>> In particular, I am pre-computing a similarity matrix between all the
>> documents (items) of my catalogue and using that matrix as the
>> ItemSimilarity for my Item-Based Recommender.
>>
>> So, when a user rates a document, how could I make the recommender outputs
>> similar documents to that ones the user has already rated even if no other
>> user in the system has rated them yet? Is that even possible in the first
>> place?
>>
>> Thanks a lot.
>>
>


Re: Recommend items not rated by any user

Posted by Juan José Ramos <jj...@gmail.com>.
In case somebody runs into the same situation, the key seems to be in the
CandidateItemStrategy being passed to the constructor
of GenericItemBasedRecommender. Looking into the code, if no
CandidateItemStrategy is specified in the
constructor, PreferredItemsNeighborhoodCandidateItemsStrategy is used and
as the documentation says, the doGetCandidateItems method: "returns all
items that have not been rated by the user and that were preferred by
another user that has preferred at least one item that the current user has
preferred too".

So, a different CandidateItemStrategy needs to be passed. For this problem,
it seems to me that AllSimilarItemsCandidateItemsStrategy,
AllUnknownItemsCandidateItemsStrategy are good candidates. Does anybody
know where to find some documentation about the different
CandidateItemStrategy? Based on the name I would say that:
1) AllSimilarItemsCandidateItemsStrategy returns all similar items
regardless of whether they have been already rated by someone or not.
2) AllUnknownItemsCandidateItemsStrategy returns all similar items that
have not been rated by anyone yet.

Does anybody know if it works like that?
Thanks.


On Tue, Mar 4, 2014 at 9:16 AM, Juan José Ramos <jj...@gmail.com> wrote:

> First thing is thatI know this requirement would not make sense in a CF
> Recommender. In my case, I am trying to use Mahout to create something
> closer to a Content-Based Recommender.
>
> In particular, I am pre-computing a similarity matrix between all the
> documents (items) of my catalogue and using that matrix as the
> ItemSimilarity for my Item-Based Recommender.
>
> So, when a user rates a document, how could I make the recommender outputs
> similar documents to that ones the user has already rated even if no other
> user in the system has rated them yet? Is that even possible in the first
> place?
>
> Thanks a lot.
>

Re: Recommend items not rated by any user

Posted by Juan José Ramos <jj...@gmail.com>.
@Pat. You described my situation very well. The only additional thing is
that I am also interested in creating some sort of a profile from the user
with all the information s/he has provided by interacting with the articles
and not only recommending similar items (news) based on a specific input.
Thus, that is why I thought using the output of RowSimilarityJob as the
ItemSimilarity of a ItemBasedRecommender would behave as I want since I use
Mahout dataModel to create that profile.


On Wed, Mar 5, 2014 at 3:40 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I am ignoring the rest of the thread because I suspect it may have gotten
> off track.
>
> Your data is new articles, right? You would like to recommend from known
> articles to any user based on an article they rate or even view. You have
> no collaborative filtering data because the lifetime of a news article is
> short and so there is not enough usage data to create a CF type
> recommender. Is this a correct problem statement? If so I don't believe you
> should be using a CF recommender from Mahout's collection.
>
> However you can use the Mahout text analysis pipeline to find all articles
> that are similar to each other. In this case when a user views any article
> in the training data you can show the most similar items precalculated with
> RowSimilarityJob and the rest of the text prep jobs. The pipeline is
> outlined here:
> https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line
>
> But this will only work for news articles already in the training data.
> Another approach it to not use Mahout at all. Simply index all docs as they
> come in with Solr. Then when a user rates or even views an article, even if
> it has not been indexed yet, you can use the viewed article as the query on
> the indexed articles and Solr will return articles ranked by similarity.
> This is a content based recommender based solely on Solr.
>
> Does this describe your situation?
>
>
> On Mar 4, 2014, at 1:16 AM, Juan José Ramos <jj...@gmail.com> wrote:
>
> First thing is thatI know this requirement would not make sense in a CF
> Recommender. In my case, I am trying to use Mahout to create something
> closer to a Content-Based Recommender.
>
> In particular, I am pre-computing a similarity matrix between all the
> documents (items) of my catalogue and using that matrix as the
> ItemSimilarity for my Item-Based Recommender.
>
> So, when a user rates a document, how could I make the recommender outputs
> similar documents to that ones the user has already rated even if no other
> user in the system has rated them yet? Is that even possible in the first
> place?
>
> Thanks a lot.
>
>

Re: Recommend items not rated by any user

Posted by Pat Ferrel <pa...@occamsmachete.com>.
I am ignoring the rest of the thread because I suspect it may have gotten off track.

Your data is new articles, right? You would like to recommend from known articles to any user based on an article they rate or even view. You have no collaborative filtering data because the lifetime of a news article is short and so there is not enough usage data to create a CF type recommender. Is this a correct problem statement? If so I don’t believe you should be using a CF recommender from Mahout’s collection.

However you can use the Mahout text analysis pipeline to find all articles that are similar to each other. In this case when a user views any article in the training data you can show the most similar items precalculated with RowSimilarityJob and the rest of the text prep jobs. The pipeline is outlined here: https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line

But this will only work for news articles already in the training data. Another approach it to not use Mahout at all. Simply index all docs as they come in with Solr. Then when a user rates or even views an article, even if it has not been indexed yet, you can use the viewed article as the query on the indexed articles and Solr will return articles ranked by similarity. This is a content based recommender based solely on Solr.

Does this describe your situation?


On Mar 4, 2014, at 1:16 AM, Juan José Ramos <jj...@gmail.com> wrote:

First thing is thatI know this requirement would not make sense in a CF
Recommender. In my case, I am trying to use Mahout to create something
closer to a Content-Based Recommender.

In particular, I am pre-computing a similarity matrix between all the
documents (items) of my catalogue and using that matrix as the
ItemSimilarity for my Item-Based Recommender.

So, when a user rates a document, how could I make the recommender outputs
similar documents to that ones the user has already rated even if no other
user in the system has rated them yet? Is that even possible in the first
place?

Thanks a lot.