You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by WangRamon <ra...@hotmail.com> on 2011/10/20 11:41:01 UTC

Recommend result contains item which user has already given preference, is that correct?




Hi Guys I finished running the RecommenderJob today on the two nodes cluster, finally. But what suprised me is that the final recommend output of the RecommenderJob contains item which user has already given preference, I'm not sure is that correct? If it was wrong how can I resolve this problem? Thanks a lot. Cheers Ramon 		 	   		  

Re: Recommend result contains item which user has already given preference, is that correct?

Posted by Sebastian Schelter <ss...@apache.org>.
Provide me with the data, I will have a look at it.

Can you say with what arguments you invoke RecommenderJob?

--sebastian

On 21.10.2011 04:01, WangRamon wrote:
> 
> Hi Sebastian Unfortunately, i still get the wrong data from the RecommenderJob after i clean everything, check the following recommend result part: 49 [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0] Now, look at the input data for user 49, item 312611, 428914, 208617, 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly all of them are wrong, I hope i can send you the test data, but it will be 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
> 49,98795,4
> 49,262163,1
> 49,66009,4
> 49,414484,2
> 49,405329,3
> 49,312611,1
> 49,336441,4
> 49,136494,5
> 49,345206,3
> 49,479179,1
> 49,318960,4
> 49,52683,3
> 49,270840,3
> 49,264828,1
> 49,222390,4
> 49,456614,5
> 49,436207,5
> 49,306308,2
> 49,391582,5
> 49,494200,4
> 49,423328,3
> 49,112997,3
> 49,229347,5
> 49,474928,3
> 49,349350,1
> 49,208508,3
> 49,314397,2
> 49,14673,2
> 49,496041,4
> 49,301875,4
> 49,234234,1
> 49,325287,3
> 49,35756,5
> 49,365097,4
> 49,13376,4
> 49,333634,2
> 49,283494,5
> 49,208617,3
> 49,245390,1
> 49,221804,2
> 49,347821,3
> 49,138954,5
> 49,164206,5
> 49,72238,1
> 49,356632,1
> 49,452296,3
> 49,182288,5
> 49,499031,5
> 49,150727,4
> 49,240533,5
> 49,326081,4
> 49,220683,2
> 49,196527,2
> 49,177165,3
> 49,411709,5
> 49,360722,3
> 49,466310,1
> 49,160375,2
> 49,137203,5
> 49,32634,4
> 49,62134,5
> 49,96982,5
> 49,196951,1
> 49,304155,5
> 49,406109,4
> 49,244276,5
> 49,189552,1
> 49,442215,3
> 49,268806,2
> 49,364912,2
> 49,410896,5
> 49,450602,5
> 49,151703,1
> 49,248872,4
> 49,21684,1
> 49,41196,1
> 49,26614,2
> 49,369075,5
> 49,321916,1
> 49,325081,1
> 49,329877,4
> 49,344661,4
> 49,8429,3
> 49,69279,1
> 49,143695,1
> 49,229120,2
> 49,26298,4
> 49,54456,1
> 49,75937,4
> 49,87042,3
> 49,345383,5
> 49,363683,4
> 49,128047,3
> 49,234878,5
> 49,428914,3
> 49,353107,2
> 49,266850,4
> 49,421211,3
> 49,265739,4
> 49,303723,1
> 49,244575,4
> 49,303625,4
> 49,350481,5
> 49,63985,4
> 49,207327,3
> 49,397535,1
> 49,300916,5
> 49,358094,4
> 49,314919,5
> 49,309355,5
> 49,403169,5
> 49,90148,4
> 49,224056,4
> 49,359181,2
> 49,341927,5
> 49,436521,4
> 49,480682,4
> 49,315561,3
> 49,218647,5
> 49,245276,2
> 49,93189,1
> 49,204695,4
> 49,498350,5
> 49,155787,3
> 49,112730,3
> 49,416756,2
> 49,411909,4
> 49,253353,2
> 49,196663,5
> 49,40903,3
> 49,51873,2
> 49,66925,3
>  > Date: Thu, 20 Oct 2011 18:40:38 +0200
>> From: ssc@apache.org
>> To: user@mahout.apache.org
>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>>
>> To put it simplified:
>>
>> The vector of recommendations is the sum of the similarity vectors for
>> all preferred items. In each similarity vector for a preferred item the
>> entry for that particular item is set to NaN.
>>
>> That means that in the recommendation vector the entries for all
>> preferred items will be NaN.
>>
>> It's a neat trick that is unfortunately very hard to see in the code.
>>
>> --sebastian
>>
>> On 20.10.2011 18:36, WangRamon wrote:
>>>
>>> Hi Sebastian
>>> "But as the entry for the item itself is set to NaN in its similarityvector and NaN plus something stays always NaN, the predicted preferencefor an item that was already preferred is NaN. And the NaN entries aredropped later."
>>> Wait a minute here, i can understand NaN plus something stays always NaN, but, how do you explain "the predicted preference for an item that was already preferred is NaN", where do you put the code to check an item that was already preferred? The only thing about NaN in SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a similarity of NaN, am i right?
>>> Thanks
>>> Ramon
>>>> Date: Thu, 20 Oct 2011 17:04:20 +0200
>>>> From: ssc@apache.org
>>>> To: user@mahout.apache.org
>>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>>>>
>>>> On 20.10.2011 16:57, WangRamon wrote:
>>>>>
>>>>> Hi Sebastian and Sean 
>>>>> Thanks for your help. 
>>>>>
>>>>> I re-read the code again (debug seems to be very difficult for me to setup the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
>>>>>     /* remove self similarity */ 
>>>>>     similarityMatrixRow.set(key.get(), Double.NaN); 
>>>>> I think the meanning is to mark the similarity between Item X and Item X (the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer, it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for all items that similar to Item i (it could be Item X or some other item), then return the top 10 (default) for a user. 
>>>>> During this process, i cannot see any code to exclude an item which the user has already given preference from recommendation. 
>>>>
>>>> It's a little bit hidden :) For each preferred item, a vector of all its
>>>> similarities is added:
>>>>
>>>>       numerators = numerators == null
>>>>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
>>>> simColumn.times(prefValue)
>>>>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
>>>> : simColumn.times(prefValue));
>>>>
>>>> But as the entry for the item itself is set to NaN in its similarity
>>>> vector and NaN plus something stays always NaN, the predicted preference
>>>> for an item that was already preferred is NaN. And the NaN entries are
>>>> dropped later.
>>>>
>>>> --sebastian
>>>>
>>>>
>>>>> Correct me if i miss something, thank you guys. 
>>>>> Cheers Ramon
>>>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
>>>>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>>>>>> From: srowen@gmail.com
>>>>>> To: user@mahout.apache.org
>>>>>>
>>>>>> Ah OK, figured as much. WangRamon does that answer your question
>>>>>> and/or can you debug to see if this is happening, not happening for
>>>>>> you in your use case?
>>>>>>
>>>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
>>>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
>>>>>>> unit test that checks whether a user is only recommended unknown items
>>>>>>> which still works.
>>>>>  		 	   		  
>>>>
>>>  		 	   		  
>>
>  		 	   		  


Re: Recommend result contains item which user has already given preference, is that correct?

Posted by Lance Norskog <go...@gmail.com>.
There is as yet no Mahout 0.6 release. The Mahout trunk is "post 0.5" and
what we're calling 0.6. You want to use the trunk. It gets a lot of testing
and production use, and these days stays pretty solid.

Lance

2011/10/21 WangRamon <ra...@hotmail.com>

>
> Ok Sebastian, I will try Mahout 0.6 next week, i believe it's from trunk,
> right? Have a nice day/weekend!   Cheers Ramon
>  > Date: Fri, 21 Oct 2011 09:06:50 +0200
> > From: ssc@apache.org
> > To: user@mahout.apache.org
> > Subject: Re: Recommend result contains item which user has already given
> preference, is that correct?
> >
> > As I already said multiple times, please use Mahout 0.6. It contains bug
> > fixes and performance improvements for this particular job.
> >
> > --sebastian
> >
> > On 21.10.2011 09:04, WangRamon wrote:
> > >
> > > Hi Sebastian I made the following change to resolve the issue in my
> local, it's in Mahout 0.5, maybe i were wrong, but the test result is
> correct: 1) I add a "int itemIdIndex" property with getter/setter methods in
> class PrefAndSimilarityColumnWritable, it will hold the item index for which
> the "prefValue" in this class is for.  2) Add
> "prefAndSimilarityColumn.setItemIdIndex(key.get());" in class
> PartialMultiplyMapper line 51 to set the item index property created in step
> 1.  3) In class AggregateAndRecommendReducer, add the following code in line
> 147:       // item which user has already given preference
> > >       int itemIdIndex = prefAndSimilarityColumn.getItemIdIndex();
> > >       // exclude item user has already given preference
> > >       simColumn.set(itemIdIndex, Double.NaN);  This will make the
> specific index value in the sim column as NaN for item that user has already
> given preference, then later plus or multiply this vector will also get a
> NaN value in that specific item index, so i exclude the items which user has
> already shown preference from recommendation. 4) At line 173 of the same
> class AggregateAndRecommendReducer, add a check to make the prediction value
> as NaN for those items user has given preference:        double prediction =
> Double.NaN;
> > >      if (!Double.isNaN(element.get())) {
> > >       prediction = element.get() / denominators.getQuick(itemIDIndex);
> > >      }
> > >  Then, i get the correct recommendation, I have thought it carefully,
> but... maybe wrong, glad to hear your idea, and again, thank you very much.
>  CheersRamon> From: ramon_wang@hotmail.com
> > >> To: user@mahout.apache.org
> > >> Subject: RE: Recommend result contains item which user has already
> given preference, is that correct?
> > >> Date: Fri, 21 Oct 2011 10:01:12 +0800
> > >>
> > >>
> > >> Hi Sebastian Unfortunately, i still get the wrong data from the
> RecommenderJob after i clean everything, check the following recommend
> result part: 49
> [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0]
> Now, look at the input data for user 49, item 312611, 428914, 208617,
> 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly
> all of them are wrong, I hope i can send you the test data, but it will be
> 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
> > >> 49,98795,4
> > >> 49,262163,1
> > >> 49,66009,4
> > >> 49,414484,2
> > >> 49,405329,3
> > >> 49,312611,1
> > >> 49,336441,4
> > >> 49,136494,5
> > >> 49,345206,3
> > >> 49,479179,1
> > >> 49,318960,4
> > >> 49,52683,3
> > >> 49,270840,3
> > >> 49,264828,1
> > >> 49,222390,4
> > >> 49,456614,5
> > >> 49,436207,5
> > >> 49,306308,2
> > >> 49,391582,5
> > >> 49,494200,4
> > >> 49,423328,3
> > >> 49,112997,3
> > >> 49,229347,5
> > >> 49,474928,3
> > >> 49,349350,1
> > >> 49,208508,3
> > >> 49,314397,2
> > >> 49,14673,2
> > >> 49,496041,4
> > >> 49,301875,4
> > >> 49,234234,1
> > >> 49,325287,3
> > >> 49,35756,5
> > >> 49,365097,4
> > >> 49,13376,4
> > >> 49,333634,2
> > >> 49,283494,5
> > >> 49,208617,3
> > >> 49,245390,1
> > >> 49,221804,2
> > >> 49,347821,3
> > >> 49,138954,5
> > >> 49,164206,5
> > >> 49,72238,1
> > >> 49,356632,1
> > >> 49,452296,3
> > >> 49,182288,5
> > >> 49,499031,5
> > >> 49,150727,4
> > >> 49,240533,5
> > >> 49,326081,4
> > >> 49,220683,2
> > >> 49,196527,2
> > >> 49,177165,3
> > >> 49,411709,5
> > >> 49,360722,3
> > >> 49,466310,1
> > >> 49,160375,2
> > >> 49,137203,5
> > >> 49,32634,4
> > >> 49,62134,5
> > >> 49,96982,5
> > >> 49,196951,1
> > >> 49,304155,5
> > >> 49,406109,4
> > >> 49,244276,5
> > >> 49,189552,1
> > >> 49,442215,3
> > >> 49,268806,2
> > >> 49,364912,2
> > >> 49,410896,5
> > >> 49,450602,5
> > >> 49,151703,1
> > >> 49,248872,4
> > >> 49,21684,1
> > >> 49,41196,1
> > >> 49,26614,2
> > >> 49,369075,5
> > >> 49,321916,1
> > >> 49,325081,1
> > >> 49,329877,4
> > >> 49,344661,4
> > >> 49,8429,3
> > >> 49,69279,1
> > >> 49,143695,1
> > >> 49,229120,2
> > >> 49,26298,4
> > >> 49,54456,1
> > >> 49,75937,4
> > >> 49,87042,3
> > >> 49,345383,5
> > >> 49,363683,4
> > >> 49,128047,3
> > >> 49,234878,5
> > >> 49,428914,3
> > >> 49,353107,2
> > >> 49,266850,4
> > >> 49,421211,3
> > >> 49,265739,4
> > >> 49,303723,1
> > >> 49,244575,4
> > >> 49,303625,4
> > >> 49,350481,5
> > >> 49,63985,4
> > >> 49,207327,3
> > >> 49,397535,1
> > >> 49,300916,5
> > >> 49,358094,4
> > >> 49,314919,5
> > >> 49,309355,5
> > >> 49,403169,5
> > >> 49,90148,4
> > >> 49,224056,4
> > >> 49,359181,2
> > >> 49,341927,5
> > >> 49,436521,4
> > >> 49,480682,4
> > >> 49,315561,3
> > >> 49,218647,5
> > >> 49,245276,2
> > >> 49,93189,1
> > >> 49,204695,4
> > >> 49,498350,5
> > >> 49,155787,3
> > >> 49,112730,3
> > >> 49,416756,2
> > >> 49,411909,4
> > >> 49,253353,2
> > >> 49,196663,5
> > >> 49,40903,3
> > >> 49,51873,2
> > >> 49,66925,3
> > >>  > Date: Thu, 20 Oct 2011 18:40:38 +0200
> > >>> From: ssc@apache.org
> > >>> To: user@mahout.apache.org
> > >>> Subject: Re: Recommend result contains item which user has already
> given preference, is that correct?
> > >>>
> > >>> To put it simplified:
> > >>>
> > >>> The vector of recommendations is the sum of the similarity vectors
> for
> > >>> all preferred items. In each similarity vector for a preferred item
> the
> > >>> entry for that particular item is set to NaN.
> > >>>
> > >>> That means that in the recommendation vector the entries for all
> > >>> preferred items will be NaN.
> > >>>
> > >>> It's a neat trick that is unfortunately very hard to see in the code.
> > >>>
> > >>> --sebastian
> > >>>
> > >>> On 20.10.2011 18:36, WangRamon wrote:
> > >>>>
> > >>>> Hi Sebastian
> > >>>> "But as the entry for the item itself is set to NaN in its
> similarityvector and NaN plus something stays always NaN, the predicted
> preferencefor an item that was already preferred is NaN. And the NaN entries
> aredropped later."
> > >>>> Wait a minute here, i can understand NaN plus something stays always
> NaN, but, how do you explain "the predicted preference for an item that was
> already preferred is NaN", where do you put the code to check an item that
> was already preferred? The only thing about NaN in
> SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a
> similarity of NaN, am i right?
> > >>>> Thanks
> > >>>> Ramon
> > >>>>> Date: Thu, 20 Oct 2011 17:04:20 +0200
> > >>>>> From: ssc@apache.org
> > >>>>> To: user@mahout.apache.org
> > >>>>> Subject: Re: Recommend result contains item which user has already
> given preference, is that correct?
> > >>>>>
> > >>>>> On 20.10.2011 16:57, WangRamon wrote:
> > >>>>>>
> > >>>>>> Hi Sebastian and Sean
> > >>>>>> Thanks for your help.
> > >>>>>>
> > >>>>>> I re-read the code again (debug seems to be very difficult for me
> to setup the environment) and find the line in
> SimilarityMatrixRowWrapperMapper,  i past it below with the comments:
> > >>>>>>     /* remove self similarity */
> > >>>>>>     similarityMatrixRow.set(key.get(), Double.NaN);
> > >>>>>> I think the meanning is to mark the similarity between Item X and
> Item X (the identical one) as NaN, but it doesn't exclude Item X from
> recommendation, then in AggregateAndRecommendReducer, it uses
> simColumn.times(prefValue) as part of the formula to calculate the
> preferences for all items that similar to Item i (it could be Item X or some
> other item), then return the top 10 (default) for a user.
> > >>>>>> During this process, i cannot see any code to exclude an item
> which the user has already given preference from recommendation.
> > >>>>>
> > >>>>> It's a little bit hidden :) For each preferred item, a vector of
> all its
> > >>>>> similarities is added:
> > >>>>>
> > >>>>>       numerators = numerators == null
> > >>>>>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
> > >>>>> simColumn.times(prefValue)
> > >>>>>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ?
> simColumn
> > >>>>> : simColumn.times(prefValue));
> > >>>>>
> > >>>>> But as the entry for the item itself is set to NaN in its
> similarity
> > >>>>> vector and NaN plus something stays always NaN, the predicted
> preference
> > >>>>> for an item that was already preferred is NaN. And the NaN entries
> are
> > >>>>> dropped later.
> > >>>>>
> > >>>>> --sebastian
> > >>>>>
> > >>>>>
> > >>>>>> Correct me if i miss something, thank you guys.
> > >>>>>> Cheers Ramon
> > >>>>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
> > >>>>>>> Subject: Re: Recommend result contains item which user has
> already given preference, is that correct?
> > >>>>>>> From: srowen@gmail.com
> > >>>>>>> To: user@mahout.apache.org
> > >>>>>>>
> > >>>>>>> Ah OK, figured as much. WangRamon does that answer your question
> > >>>>>>> and/or can you debug to see if this is happening, not happening
> for
> > >>>>>>> you in your use case?
> > >>>>>>>
> > >>>>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <
> ssc@apache.org> wrote:
> > >>>>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also
> have a
> > >>>>>>>> unit test that checks whether a user is only recommended unknown
> items
> > >>>>>>>> which still works.
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>
>



-- 
Lance Norskog
goksron@gmail.com

RE: Recommend result contains item which user has already given preference, is that correct?

Posted by WangRamon <ra...@hotmail.com>.
Ok Sebastian, I will try Mahout 0.6 next week, i believe it's from trunk, right? Have a nice day/weekend!   Cheers Ramon
 > Date: Fri, 21 Oct 2011 09:06:50 +0200
> From: ssc@apache.org
> To: user@mahout.apache.org
> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> 
> As I already said multiple times, please use Mahout 0.6. It contains bug
> fixes and performance improvements for this particular job.
> 
> --sebastian
> 
> On 21.10.2011 09:04, WangRamon wrote:
> > 
> > Hi Sebastian I made the following change to resolve the issue in my local, it's in Mahout 0.5, maybe i were wrong, but the test result is correct: 1) I add a "int itemIdIndex" property with getter/setter methods in class PrefAndSimilarityColumnWritable, it will hold the item index for which the "prefValue" in this class is for.  2) Add "prefAndSimilarityColumn.setItemIdIndex(key.get());" in class PartialMultiplyMapper line 51 to set the item index property created in step 1.  3) In class AggregateAndRecommendReducer, add the following code in line 147:       // item which user has already given preference
> >       int itemIdIndex = prefAndSimilarityColumn.getItemIdIndex();
> >       // exclude item user has already given preference
> >       simColumn.set(itemIdIndex, Double.NaN);  This will make the specific index value in the sim column as NaN for item that user has already given preference, then later plus or multiply this vector will also get a NaN value in that specific item index, so i exclude the items which user has already shown preference from recommendation. 4) At line 173 of the same class AggregateAndRecommendReducer, add a check to make the prediction value as NaN for those items user has given preference:        double prediction = Double.NaN;
> >      if (!Double.isNaN(element.get())) {
> >       prediction = element.get() / denominators.getQuick(itemIDIndex);
> >      }
> >  Then, i get the correct recommendation, I have thought it carefully, but... maybe wrong, glad to hear your idea, and again, thank you very much.  CheersRamon> From: ramon_wang@hotmail.com
> >> To: user@mahout.apache.org
> >> Subject: RE: Recommend result contains item which user has already given preference, is that correct?
> >> Date: Fri, 21 Oct 2011 10:01:12 +0800
> >>
> >>
> >> Hi Sebastian Unfortunately, i still get the wrong data from the RecommenderJob after i clean everything, check the following recommend result part: 49 [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0] Now, look at the input data for user 49, item 312611, 428914, 208617, 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly all of them are wrong, I hope i can send you the test data, but it will be 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
> >> 49,98795,4
> >> 49,262163,1
> >> 49,66009,4
> >> 49,414484,2
> >> 49,405329,3
> >> 49,312611,1
> >> 49,336441,4
> >> 49,136494,5
> >> 49,345206,3
> >> 49,479179,1
> >> 49,318960,4
> >> 49,52683,3
> >> 49,270840,3
> >> 49,264828,1
> >> 49,222390,4
> >> 49,456614,5
> >> 49,436207,5
> >> 49,306308,2
> >> 49,391582,5
> >> 49,494200,4
> >> 49,423328,3
> >> 49,112997,3
> >> 49,229347,5
> >> 49,474928,3
> >> 49,349350,1
> >> 49,208508,3
> >> 49,314397,2
> >> 49,14673,2
> >> 49,496041,4
> >> 49,301875,4
> >> 49,234234,1
> >> 49,325287,3
> >> 49,35756,5
> >> 49,365097,4
> >> 49,13376,4
> >> 49,333634,2
> >> 49,283494,5
> >> 49,208617,3
> >> 49,245390,1
> >> 49,221804,2
> >> 49,347821,3
> >> 49,138954,5
> >> 49,164206,5
> >> 49,72238,1
> >> 49,356632,1
> >> 49,452296,3
> >> 49,182288,5
> >> 49,499031,5
> >> 49,150727,4
> >> 49,240533,5
> >> 49,326081,4
> >> 49,220683,2
> >> 49,196527,2
> >> 49,177165,3
> >> 49,411709,5
> >> 49,360722,3
> >> 49,466310,1
> >> 49,160375,2
> >> 49,137203,5
> >> 49,32634,4
> >> 49,62134,5
> >> 49,96982,5
> >> 49,196951,1
> >> 49,304155,5
> >> 49,406109,4
> >> 49,244276,5
> >> 49,189552,1
> >> 49,442215,3
> >> 49,268806,2
> >> 49,364912,2
> >> 49,410896,5
> >> 49,450602,5
> >> 49,151703,1
> >> 49,248872,4
> >> 49,21684,1
> >> 49,41196,1
> >> 49,26614,2
> >> 49,369075,5
> >> 49,321916,1
> >> 49,325081,1
> >> 49,329877,4
> >> 49,344661,4
> >> 49,8429,3
> >> 49,69279,1
> >> 49,143695,1
> >> 49,229120,2
> >> 49,26298,4
> >> 49,54456,1
> >> 49,75937,4
> >> 49,87042,3
> >> 49,345383,5
> >> 49,363683,4
> >> 49,128047,3
> >> 49,234878,5
> >> 49,428914,3
> >> 49,353107,2
> >> 49,266850,4
> >> 49,421211,3
> >> 49,265739,4
> >> 49,303723,1
> >> 49,244575,4
> >> 49,303625,4
> >> 49,350481,5
> >> 49,63985,4
> >> 49,207327,3
> >> 49,397535,1
> >> 49,300916,5
> >> 49,358094,4
> >> 49,314919,5
> >> 49,309355,5
> >> 49,403169,5
> >> 49,90148,4
> >> 49,224056,4
> >> 49,359181,2
> >> 49,341927,5
> >> 49,436521,4
> >> 49,480682,4
> >> 49,315561,3
> >> 49,218647,5
> >> 49,245276,2
> >> 49,93189,1
> >> 49,204695,4
> >> 49,498350,5
> >> 49,155787,3
> >> 49,112730,3
> >> 49,416756,2
> >> 49,411909,4
> >> 49,253353,2
> >> 49,196663,5
> >> 49,40903,3
> >> 49,51873,2
> >> 49,66925,3
> >>  > Date: Thu, 20 Oct 2011 18:40:38 +0200
> >>> From: ssc@apache.org
> >>> To: user@mahout.apache.org
> >>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> >>>
> >>> To put it simplified:
> >>>
> >>> The vector of recommendations is the sum of the similarity vectors for
> >>> all preferred items. In each similarity vector for a preferred item the
> >>> entry for that particular item is set to NaN.
> >>>
> >>> That means that in the recommendation vector the entries for all
> >>> preferred items will be NaN.
> >>>
> >>> It's a neat trick that is unfortunately very hard to see in the code.
> >>>
> >>> --sebastian
> >>>
> >>> On 20.10.2011 18:36, WangRamon wrote:
> >>>>
> >>>> Hi Sebastian
> >>>> "But as the entry for the item itself is set to NaN in its similarityvector and NaN plus something stays always NaN, the predicted preferencefor an item that was already preferred is NaN. And the NaN entries aredropped later."
> >>>> Wait a minute here, i can understand NaN plus something stays always NaN, but, how do you explain "the predicted preference for an item that was already preferred is NaN", where do you put the code to check an item that was already preferred? The only thing about NaN in SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a similarity of NaN, am i right?
> >>>> Thanks
> >>>> Ramon
> >>>>> Date: Thu, 20 Oct 2011 17:04:20 +0200
> >>>>> From: ssc@apache.org
> >>>>> To: user@mahout.apache.org
> >>>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> >>>>>
> >>>>> On 20.10.2011 16:57, WangRamon wrote:
> >>>>>>
> >>>>>> Hi Sebastian and Sean 
> >>>>>> Thanks for your help. 
> >>>>>>
> >>>>>> I re-read the code again (debug seems to be very difficult for me to setup the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
> >>>>>>     /* remove self similarity */ 
> >>>>>>     similarityMatrixRow.set(key.get(), Double.NaN); 
> >>>>>> I think the meanning is to mark the similarity between Item X and Item X (the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer, it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for all items that similar to Item i (it could be Item X or some other item), then return the top 10 (default) for a user. 
> >>>>>> During this process, i cannot see any code to exclude an item which the user has already given preference from recommendation. 
> >>>>>
> >>>>> It's a little bit hidden :) For each preferred item, a vector of all its
> >>>>> similarities is added:
> >>>>>
> >>>>>       numerators = numerators == null
> >>>>>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
> >>>>> simColumn.times(prefValue)
> >>>>>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
> >>>>> : simColumn.times(prefValue));
> >>>>>
> >>>>> But as the entry for the item itself is set to NaN in its similarity
> >>>>> vector and NaN plus something stays always NaN, the predicted preference
> >>>>> for an item that was already preferred is NaN. And the NaN entries are
> >>>>> dropped later.
> >>>>>
> >>>>> --sebastian
> >>>>>
> >>>>>
> >>>>>> Correct me if i miss something, thank you guys. 
> >>>>>> Cheers Ramon
> >>>>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
> >>>>>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> >>>>>>> From: srowen@gmail.com
> >>>>>>> To: user@mahout.apache.org
> >>>>>>>
> >>>>>>> Ah OK, figured as much. WangRamon does that answer your question
> >>>>>>> and/or can you debug to see if this is happening, not happening for
> >>>>>>> you in your use case?
> >>>>>>>
> >>>>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
> >>>>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
> >>>>>>>> unit test that checks whether a user is only recommended unknown items
> >>>>>>>> which still works.
> >>>>>>  		 	   		  
> >>>>>
> >>>>  		 	   		  
> >>>
> >>  		 	   		  
> >  		 	   		  
> 
 		 	   		  

RE: Recommend result contains item which user has already given preference, is that correct?

Posted by WangRamon <ra...@hotmail.com>.
Hi Sebastian  I have tried Mahout 0.6 SNAPSHOT,  it's great, the test result of the RecommenderJob shows it brings us huge performance boots and there is no this issue as described in this mail thread, thanks.  Cheers Ramon
 > Date: Fri, 21 Oct 2011 09:06:50 +0200
> From: ssc@apache.org
> To: user@mahout.apache.org
> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> 
> As I already said multiple times, please use Mahout 0.6. It contains bug
> fixes and performance improvements for this particular job.
> 
> --sebastian
> 
> On 21.10.2011 09:04, WangRamon wrote:
> > 
> > Hi Sebastian I made the following change to resolve the issue in my local, it's in Mahout 0.5, maybe i were wrong, but the test result is correct: 1) I add a "int itemIdIndex" property with getter/setter methods in class PrefAndSimilarityColumnWritable, it will hold the item index for which the "prefValue" in this class is for.  2) Add "prefAndSimilarityColumn.setItemIdIndex(key.get());" in class PartialMultiplyMapper line 51 to set the item index property created in step 1.  3) In class AggregateAndRecommendReducer, add the following code in line 147:       // item which user has already given preference
> >       int itemIdIndex = prefAndSimilarityColumn.getItemIdIndex();
> >       // exclude item user has already given preference
> >       simColumn.set(itemIdIndex, Double.NaN);  This will make the specific index value in the sim column as NaN for item that user has already given preference, then later plus or multiply this vector will also get a NaN value in that specific item index, so i exclude the items which user has already shown preference from recommendation. 4) At line 173 of the same class AggregateAndRecommendReducer, add a check to make the prediction value as NaN for those items user has given preference:        double prediction = Double.NaN;
> >      if (!Double.isNaN(element.get())) {
> >       prediction = element.get() / denominators.getQuick(itemIDIndex);
> >      }
> >  Then, i get the correct recommendation, I have thought it carefully, but... maybe wrong, glad to hear your idea, and again, thank you very much.  CheersRamon> From: ramon_wang@hotmail.com
> >> To: user@mahout.apache.org
> >> Subject: RE: Recommend result contains item which user has already given preference, is that correct?
> >> Date: Fri, 21 Oct 2011 10:01:12 +0800
> >>
> >>
> >> Hi Sebastian Unfortunately, i still get the wrong data from the RecommenderJob after i clean everything, check the following recommend result part: 49 [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0] Now, look at the input data for user 49, item 312611, 428914, 208617, 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly all of them are wrong, I hope i can send you the test data, but it will be 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
> >> 49,98795,4
> >> 49,262163,1
> >> 49,66009,4
> >> 49,414484,2
> >> 49,405329,3
> >> 49,312611,1
> >> 49,336441,4
> >> 49,136494,5
> >> 49,345206,3
> >> 49,479179,1
> >> 49,318960,4
> >> 49,52683,3
> >> 49,270840,3
> >> 49,264828,1
> >> 49,222390,4
> >> 49,456614,5
> >> 49,436207,5
> >> 49,306308,2
> >> 49,391582,5
> >> 49,494200,4
> >> 49,423328,3
> >> 49,112997,3
> >> 49,229347,5
> >> 49,474928,3
> >> 49,349350,1
> >> 49,208508,3
> >> 49,314397,2
> >> 49,14673,2
> >> 49,496041,4
> >> 49,301875,4
> >> 49,234234,1
> >> 49,325287,3
> >> 49,35756,5
> >> 49,365097,4
> >> 49,13376,4
> >> 49,333634,2
> >> 49,283494,5
> >> 49,208617,3
> >> 49,245390,1
> >> 49,221804,2
> >> 49,347821,3
> >> 49,138954,5
> >> 49,164206,5
> >> 49,72238,1
> >> 49,356632,1
> >> 49,452296,3
> >> 49,182288,5
> >> 49,499031,5
> >> 49,150727,4
> >> 49,240533,5
> >> 49,326081,4
> >> 49,220683,2
> >> 49,196527,2
> >> 49,177165,3
> >> 49,411709,5
> >> 49,360722,3
> >> 49,466310,1
> >> 49,160375,2
> >> 49,137203,5
> >> 49,32634,4
> >> 49,62134,5
> >> 49,96982,5
> >> 49,196951,1
> >> 49,304155,5
> >> 49,406109,4
> >> 49,244276,5
> >> 49,189552,1
> >> 49,442215,3
> >> 49,268806,2
> >> 49,364912,2
> >> 49,410896,5
> >> 49,450602,5
> >> 49,151703,1
> >> 49,248872,4
> >> 49,21684,1
> >> 49,41196,1
> >> 49,26614,2
> >> 49,369075,5
> >> 49,321916,1
> >> 49,325081,1
> >> 49,329877,4
> >> 49,344661,4
> >> 49,8429,3
> >> 49,69279,1
> >> 49,143695,1
> >> 49,229120,2
> >> 49,26298,4
> >> 49,54456,1
> >> 49,75937,4
> >> 49,87042,3
> >> 49,345383,5
> >> 49,363683,4
> >> 49,128047,3
> >> 49,234878,5
> >> 49,428914,3
> >> 49,353107,2
> >> 49,266850,4
> >> 49,421211,3
> >> 49,265739,4
> >> 49,303723,1
> >> 49,244575,4
> >> 49,303625,4
> >> 49,350481,5
> >> 49,63985,4
> >> 49,207327,3
> >> 49,397535,1
> >> 49,300916,5
> >> 49,358094,4
> >> 49,314919,5
> >> 49,309355,5
> >> 49,403169,5
> >> 49,90148,4
> >> 49,224056,4
> >> 49,359181,2
> >> 49,341927,5
> >> 49,436521,4
> >> 49,480682,4
> >> 49,315561,3
> >> 49,218647,5
> >> 49,245276,2
> >> 49,93189,1
> >> 49,204695,4
> >> 49,498350,5
> >> 49,155787,3
> >> 49,112730,3
> >> 49,416756,2
> >> 49,411909,4
> >> 49,253353,2
> >> 49,196663,5
> >> 49,40903,3
> >> 49,51873,2
> >> 49,66925,3
> >>  > Date: Thu, 20 Oct 2011 18:40:38 +0200
> >>> From: ssc@apache.org
> >>> To: user@mahout.apache.org
> >>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> >>>
> >>> To put it simplified:
> >>>
> >>> The vector of recommendations is the sum of the similarity vectors for
> >>> all preferred items. In each similarity vector for a preferred item the
> >>> entry for that particular item is set to NaN.
> >>>
> >>> That means that in the recommendation vector the entries for all
> >>> preferred items will be NaN.
> >>>
> >>> It's a neat trick that is unfortunately very hard to see in the code.
> >>>
> >>> --sebastian
> >>>
> >>> On 20.10.2011 18:36, WangRamon wrote:
> >>>>
> >>>> Hi Sebastian
> >>>> "But as the entry for the item itself is set to NaN in its similarityvector and NaN plus something stays always NaN, the predicted preferencefor an item that was already preferred is NaN. And the NaN entries aredropped later."
> >>>> Wait a minute here, i can understand NaN plus something stays always NaN, but, how do you explain "the predicted preference for an item that was already preferred is NaN", where do you put the code to check an item that was already preferred? The only thing about NaN in SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a similarity of NaN, am i right?
> >>>> Thanks
> >>>> Ramon
> >>>>> Date: Thu, 20 Oct 2011 17:04:20 +0200
> >>>>> From: ssc@apache.org
> >>>>> To: user@mahout.apache.org
> >>>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> >>>>>
> >>>>> On 20.10.2011 16:57, WangRamon wrote:
> >>>>>>
> >>>>>> Hi Sebastian and Sean 
> >>>>>> Thanks for your help. 
> >>>>>>
> >>>>>> I re-read the code again (debug seems to be very difficult for me to setup the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
> >>>>>>     /* remove self similarity */ 
> >>>>>>     similarityMatrixRow.set(key.get(), Double.NaN); 
> >>>>>> I think the meanning is to mark the similarity between Item X and Item X (the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer, it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for all items that similar to Item i (it could be Item X or some other item), then return the top 10 (default) for a user. 
> >>>>>> During this process, i cannot see any code to exclude an item which the user has already given preference from recommendation. 
> >>>>>
> >>>>> It's a little bit hidden :) For each preferred item, a vector of all its
> >>>>> similarities is added:
> >>>>>
> >>>>>       numerators = numerators == null
> >>>>>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
> >>>>> simColumn.times(prefValue)
> >>>>>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
> >>>>> : simColumn.times(prefValue));
> >>>>>
> >>>>> But as the entry for the item itself is set to NaN in its similarity
> >>>>> vector and NaN plus something stays always NaN, the predicted preference
> >>>>> for an item that was already preferred is NaN. And the NaN entries are
> >>>>> dropped later.
> >>>>>
> >>>>> --sebastian
> >>>>>
> >>>>>
> >>>>>> Correct me if i miss something, thank you guys. 
> >>>>>> Cheers Ramon
> >>>>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
> >>>>>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> >>>>>>> From: srowen@gmail.com
> >>>>>>> To: user@mahout.apache.org
> >>>>>>>
> >>>>>>> Ah OK, figured as much. WangRamon does that answer your question
> >>>>>>> and/or can you debug to see if this is happening, not happening for
> >>>>>>> you in your use case?
> >>>>>>>
> >>>>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
> >>>>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
> >>>>>>>> unit test that checks whether a user is only recommended unknown items
> >>>>>>>> which still works.
> >>>>>>  		 	   		  
> >>>>>
> >>>>  		 	   		  
> >>>
> >>  		 	   		  
> >  		 	   		  
> 
 		 	   		  

Re: Recommend result contains item which user has already given preference, is that correct?

Posted by Sebastian Schelter <ss...@apache.org>.
As I already said multiple times, please use Mahout 0.6. It contains bug
fixes and performance improvements for this particular job.

--sebastian

On 21.10.2011 09:04, WangRamon wrote:
> 
> Hi Sebastian I made the following change to resolve the issue in my local, it's in Mahout 0.5, maybe i were wrong, but the test result is correct: 1) I add a "int itemIdIndex" property with getter/setter methods in class PrefAndSimilarityColumnWritable, it will hold the item index for which the "prefValue" in this class is for.  2) Add "prefAndSimilarityColumn.setItemIdIndex(key.get());" in class PartialMultiplyMapper line 51 to set the item index property created in step 1.  3) In class AggregateAndRecommendReducer, add the following code in line 147:       // item which user has already given preference
>       int itemIdIndex = prefAndSimilarityColumn.getItemIdIndex();
>       // exclude item user has already given preference
>       simColumn.set(itemIdIndex, Double.NaN);  This will make the specific index value in the sim column as NaN for item that user has already given preference, then later plus or multiply this vector will also get a NaN value in that specific item index, so i exclude the items which user has already shown preference from recommendation. 4) At line 173 of the same class AggregateAndRecommendReducer, add a check to make the prediction value as NaN for those items user has given preference:        double prediction = Double.NaN;
>      if (!Double.isNaN(element.get())) {
>       prediction = element.get() / denominators.getQuick(itemIDIndex);
>      }
>  Then, i get the correct recommendation, I have thought it carefully, but... maybe wrong, glad to hear your idea, and again, thank you very much.  CheersRamon> From: ramon_wang@hotmail.com
>> To: user@mahout.apache.org
>> Subject: RE: Recommend result contains item which user has already given preference, is that correct?
>> Date: Fri, 21 Oct 2011 10:01:12 +0800
>>
>>
>> Hi Sebastian Unfortunately, i still get the wrong data from the RecommenderJob after i clean everything, check the following recommend result part: 49 [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0] Now, look at the input data for user 49, item 312611, 428914, 208617, 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly all of them are wrong, I hope i can send you the test data, but it will be 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
>> 49,98795,4
>> 49,262163,1
>> 49,66009,4
>> 49,414484,2
>> 49,405329,3
>> 49,312611,1
>> 49,336441,4
>> 49,136494,5
>> 49,345206,3
>> 49,479179,1
>> 49,318960,4
>> 49,52683,3
>> 49,270840,3
>> 49,264828,1
>> 49,222390,4
>> 49,456614,5
>> 49,436207,5
>> 49,306308,2
>> 49,391582,5
>> 49,494200,4
>> 49,423328,3
>> 49,112997,3
>> 49,229347,5
>> 49,474928,3
>> 49,349350,1
>> 49,208508,3
>> 49,314397,2
>> 49,14673,2
>> 49,496041,4
>> 49,301875,4
>> 49,234234,1
>> 49,325287,3
>> 49,35756,5
>> 49,365097,4
>> 49,13376,4
>> 49,333634,2
>> 49,283494,5
>> 49,208617,3
>> 49,245390,1
>> 49,221804,2
>> 49,347821,3
>> 49,138954,5
>> 49,164206,5
>> 49,72238,1
>> 49,356632,1
>> 49,452296,3
>> 49,182288,5
>> 49,499031,5
>> 49,150727,4
>> 49,240533,5
>> 49,326081,4
>> 49,220683,2
>> 49,196527,2
>> 49,177165,3
>> 49,411709,5
>> 49,360722,3
>> 49,466310,1
>> 49,160375,2
>> 49,137203,5
>> 49,32634,4
>> 49,62134,5
>> 49,96982,5
>> 49,196951,1
>> 49,304155,5
>> 49,406109,4
>> 49,244276,5
>> 49,189552,1
>> 49,442215,3
>> 49,268806,2
>> 49,364912,2
>> 49,410896,5
>> 49,450602,5
>> 49,151703,1
>> 49,248872,4
>> 49,21684,1
>> 49,41196,1
>> 49,26614,2
>> 49,369075,5
>> 49,321916,1
>> 49,325081,1
>> 49,329877,4
>> 49,344661,4
>> 49,8429,3
>> 49,69279,1
>> 49,143695,1
>> 49,229120,2
>> 49,26298,4
>> 49,54456,1
>> 49,75937,4
>> 49,87042,3
>> 49,345383,5
>> 49,363683,4
>> 49,128047,3
>> 49,234878,5
>> 49,428914,3
>> 49,353107,2
>> 49,266850,4
>> 49,421211,3
>> 49,265739,4
>> 49,303723,1
>> 49,244575,4
>> 49,303625,4
>> 49,350481,5
>> 49,63985,4
>> 49,207327,3
>> 49,397535,1
>> 49,300916,5
>> 49,358094,4
>> 49,314919,5
>> 49,309355,5
>> 49,403169,5
>> 49,90148,4
>> 49,224056,4
>> 49,359181,2
>> 49,341927,5
>> 49,436521,4
>> 49,480682,4
>> 49,315561,3
>> 49,218647,5
>> 49,245276,2
>> 49,93189,1
>> 49,204695,4
>> 49,498350,5
>> 49,155787,3
>> 49,112730,3
>> 49,416756,2
>> 49,411909,4
>> 49,253353,2
>> 49,196663,5
>> 49,40903,3
>> 49,51873,2
>> 49,66925,3
>>  > Date: Thu, 20 Oct 2011 18:40:38 +0200
>>> From: ssc@apache.org
>>> To: user@mahout.apache.org
>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>>>
>>> To put it simplified:
>>>
>>> The vector of recommendations is the sum of the similarity vectors for
>>> all preferred items. In each similarity vector for a preferred item the
>>> entry for that particular item is set to NaN.
>>>
>>> That means that in the recommendation vector the entries for all
>>> preferred items will be NaN.
>>>
>>> It's a neat trick that is unfortunately very hard to see in the code.
>>>
>>> --sebastian
>>>
>>> On 20.10.2011 18:36, WangRamon wrote:
>>>>
>>>> Hi Sebastian
>>>> "But as the entry for the item itself is set to NaN in its similarityvector and NaN plus something stays always NaN, the predicted preferencefor an item that was already preferred is NaN. And the NaN entries aredropped later."
>>>> Wait a minute here, i can understand NaN plus something stays always NaN, but, how do you explain "the predicted preference for an item that was already preferred is NaN", where do you put the code to check an item that was already preferred? The only thing about NaN in SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a similarity of NaN, am i right?
>>>> Thanks
>>>> Ramon
>>>>> Date: Thu, 20 Oct 2011 17:04:20 +0200
>>>>> From: ssc@apache.org
>>>>> To: user@mahout.apache.org
>>>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>>>>>
>>>>> On 20.10.2011 16:57, WangRamon wrote:
>>>>>>
>>>>>> Hi Sebastian and Sean 
>>>>>> Thanks for your help. 
>>>>>>
>>>>>> I re-read the code again (debug seems to be very difficult for me to setup the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
>>>>>>     /* remove self similarity */ 
>>>>>>     similarityMatrixRow.set(key.get(), Double.NaN); 
>>>>>> I think the meanning is to mark the similarity between Item X and Item X (the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer, it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for all items that similar to Item i (it could be Item X or some other item), then return the top 10 (default) for a user. 
>>>>>> During this process, i cannot see any code to exclude an item which the user has already given preference from recommendation. 
>>>>>
>>>>> It's a little bit hidden :) For each preferred item, a vector of all its
>>>>> similarities is added:
>>>>>
>>>>>       numerators = numerators == null
>>>>>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
>>>>> simColumn.times(prefValue)
>>>>>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
>>>>> : simColumn.times(prefValue));
>>>>>
>>>>> But as the entry for the item itself is set to NaN in its similarity
>>>>> vector and NaN plus something stays always NaN, the predicted preference
>>>>> for an item that was already preferred is NaN. And the NaN entries are
>>>>> dropped later.
>>>>>
>>>>> --sebastian
>>>>>
>>>>>
>>>>>> Correct me if i miss something, thank you guys. 
>>>>>> Cheers Ramon
>>>>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
>>>>>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>>>>>>> From: srowen@gmail.com
>>>>>>> To: user@mahout.apache.org
>>>>>>>
>>>>>>> Ah OK, figured as much. WangRamon does that answer your question
>>>>>>> and/or can you debug to see if this is happening, not happening for
>>>>>>> you in your use case?
>>>>>>>
>>>>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
>>>>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
>>>>>>>> unit test that checks whether a user is only recommended unknown items
>>>>>>>> which still works.
>>>>>>  		 	   		  
>>>>>
>>>>  		 	   		  
>>>
>>  		 	   		  
>  		 	   		  


RE: Recommend result contains item which user has already given preference, is that correct?

Posted by WangRamon <ra...@hotmail.com>.
Hi Sebastian I made the following change to resolve the issue in my local, it's in Mahout 0.5, maybe i were wrong, but the test result is correct: 1) I add a "int itemIdIndex" property with getter/setter methods in class PrefAndSimilarityColumnWritable, it will hold the item index for which the "prefValue" in this class is for.  2) Add "prefAndSimilarityColumn.setItemIdIndex(key.get());" in class PartialMultiplyMapper line 51 to set the item index property created in step 1.  3) In class AggregateAndRecommendReducer, add the following code in line 147:       // item which user has already given preference
      int itemIdIndex = prefAndSimilarityColumn.getItemIdIndex();
      // exclude item user has already given preference
      simColumn.set(itemIdIndex, Double.NaN);  This will make the specific index value in the sim column as NaN for item that user has already given preference, then later plus or multiply this vector will also get a NaN value in that specific item index, so i exclude the items which user has already shown preference from recommendation. 4) At line 173 of the same class AggregateAndRecommendReducer, add a check to make the prediction value as NaN for those items user has given preference:        double prediction = Double.NaN;
     if (!Double.isNaN(element.get())) {
      prediction = element.get() / denominators.getQuick(itemIDIndex);
     }
 Then, i get the correct recommendation, I have thought it carefully, but... maybe wrong, glad to hear your idea, and again, thank you very much.  CheersRamon> From: ramon_wang@hotmail.com
> To: user@mahout.apache.org
> Subject: RE: Recommend result contains item which user has already given preference, is that correct?
> Date: Fri, 21 Oct 2011 10:01:12 +0800
> 
> 
> Hi Sebastian Unfortunately, i still get the wrong data from the RecommenderJob after i clean everything, check the following recommend result part: 49 [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0] Now, look at the input data for user 49, item 312611, 428914, 208617, 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly all of them are wrong, I hope i can send you the test data, but it will be 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
> 49,98795,4
> 49,262163,1
> 49,66009,4
> 49,414484,2
> 49,405329,3
> 49,312611,1
> 49,336441,4
> 49,136494,5
> 49,345206,3
> 49,479179,1
> 49,318960,4
> 49,52683,3
> 49,270840,3
> 49,264828,1
> 49,222390,4
> 49,456614,5
> 49,436207,5
> 49,306308,2
> 49,391582,5
> 49,494200,4
> 49,423328,3
> 49,112997,3
> 49,229347,5
> 49,474928,3
> 49,349350,1
> 49,208508,3
> 49,314397,2
> 49,14673,2
> 49,496041,4
> 49,301875,4
> 49,234234,1
> 49,325287,3
> 49,35756,5
> 49,365097,4
> 49,13376,4
> 49,333634,2
> 49,283494,5
> 49,208617,3
> 49,245390,1
> 49,221804,2
> 49,347821,3
> 49,138954,5
> 49,164206,5
> 49,72238,1
> 49,356632,1
> 49,452296,3
> 49,182288,5
> 49,499031,5
> 49,150727,4
> 49,240533,5
> 49,326081,4
> 49,220683,2
> 49,196527,2
> 49,177165,3
> 49,411709,5
> 49,360722,3
> 49,466310,1
> 49,160375,2
> 49,137203,5
> 49,32634,4
> 49,62134,5
> 49,96982,5
> 49,196951,1
> 49,304155,5
> 49,406109,4
> 49,244276,5
> 49,189552,1
> 49,442215,3
> 49,268806,2
> 49,364912,2
> 49,410896,5
> 49,450602,5
> 49,151703,1
> 49,248872,4
> 49,21684,1
> 49,41196,1
> 49,26614,2
> 49,369075,5
> 49,321916,1
> 49,325081,1
> 49,329877,4
> 49,344661,4
> 49,8429,3
> 49,69279,1
> 49,143695,1
> 49,229120,2
> 49,26298,4
> 49,54456,1
> 49,75937,4
> 49,87042,3
> 49,345383,5
> 49,363683,4
> 49,128047,3
> 49,234878,5
> 49,428914,3
> 49,353107,2
> 49,266850,4
> 49,421211,3
> 49,265739,4
> 49,303723,1
> 49,244575,4
> 49,303625,4
> 49,350481,5
> 49,63985,4
> 49,207327,3
> 49,397535,1
> 49,300916,5
> 49,358094,4
> 49,314919,5
> 49,309355,5
> 49,403169,5
> 49,90148,4
> 49,224056,4
> 49,359181,2
> 49,341927,5
> 49,436521,4
> 49,480682,4
> 49,315561,3
> 49,218647,5
> 49,245276,2
> 49,93189,1
> 49,204695,4
> 49,498350,5
> 49,155787,3
> 49,112730,3
> 49,416756,2
> 49,411909,4
> 49,253353,2
> 49,196663,5
> 49,40903,3
> 49,51873,2
> 49,66925,3
>  > Date: Thu, 20 Oct 2011 18:40:38 +0200
> > From: ssc@apache.org
> > To: user@mahout.apache.org
> > Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> > 
> > To put it simplified:
> > 
> > The vector of recommendations is the sum of the similarity vectors for
> > all preferred items. In each similarity vector for a preferred item the
> > entry for that particular item is set to NaN.
> > 
> > That means that in the recommendation vector the entries for all
> > preferred items will be NaN.
> > 
> > It's a neat trick that is unfortunately very hard to see in the code.
> > 
> > --sebastian
> > 
> > On 20.10.2011 18:36, WangRamon wrote:
> > > 
> > > Hi Sebastian
> > > "But as the entry for the item itself is set to NaN in its similarityvector and NaN plus something stays always NaN, the predicted preferencefor an item that was already preferred is NaN. And the NaN entries aredropped later."
> > > Wait a minute here, i can understand NaN plus something stays always NaN, but, how do you explain "the predicted preference for an item that was already preferred is NaN", where do you put the code to check an item that was already preferred? The only thing about NaN in SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a similarity of NaN, am i right?
> > > Thanks
> > > Ramon
> > >> Date: Thu, 20 Oct 2011 17:04:20 +0200
> > >> From: ssc@apache.org
> > >> To: user@mahout.apache.org
> > >> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> > >>
> > >> On 20.10.2011 16:57, WangRamon wrote:
> > >>>
> > >>> Hi Sebastian and Sean 
> > >>> Thanks for your help. 
> > >>>
> > >>> I re-read the code again (debug seems to be very difficult for me to setup the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
> > >>>     /* remove self similarity */ 
> > >>>     similarityMatrixRow.set(key.get(), Double.NaN); 
> > >>> I think the meanning is to mark the similarity between Item X and Item X (the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer, it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for all items that similar to Item i (it could be Item X or some other item), then return the top 10 (default) for a user. 
> > >>> During this process, i cannot see any code to exclude an item which the user has already given preference from recommendation. 
> > >>
> > >> It's a little bit hidden :) For each preferred item, a vector of all its
> > >> similarities is added:
> > >>
> > >>       numerators = numerators == null
> > >>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
> > >> simColumn.times(prefValue)
> > >>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
> > >> : simColumn.times(prefValue));
> > >>
> > >> But as the entry for the item itself is set to NaN in its similarity
> > >> vector and NaN plus something stays always NaN, the predicted preference
> > >> for an item that was already preferred is NaN. And the NaN entries are
> > >> dropped later.
> > >>
> > >> --sebastian
> > >>
> > >>
> > >>> Correct me if i miss something, thank you guys. 
> > >>> Cheers Ramon
> > >>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
> > >>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> > >>>> From: srowen@gmail.com
> > >>>> To: user@mahout.apache.org
> > >>>>
> > >>>> Ah OK, figured as much. WangRamon does that answer your question
> > >>>> and/or can you debug to see if this is happening, not happening for
> > >>>> you in your use case?
> > >>>>
> > >>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
> > >>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
> > >>>>> unit test that checks whether a user is only recommended unknown items
> > >>>>> which still works.
> > >>>  		 	   		  
> > >>
> > >  		 	   		  
> > 
>  		 	   		  
 		 	   		  

RE: Recommend result contains item which user has already given preference, is that correct?

Posted by WangRamon <ra...@hotmail.com>.
Hi Sebastian Unfortunately, i still get the wrong data from the RecommenderJob after i clean everything, check the following recommend result part: 49 [300420:5.0,312611:5.0,428914:5.0,208617:5.0,345206:5.0,411909:5.0,363683:5.0,248872:5.0,93087:5.0,494200:5.0] Now, look at the input data for user 49, item 312611, 428914, 208617, 345206, 411909, 363683, 248872 and 494200 are wrong recommendation, nearly all of them are wrong, I hope i can send you the test data, but it will be 50M+ in size, can we discuss offline? Thank you very much. 49,409769,4
49,98795,4
49,262163,1
49,66009,4
49,414484,2
49,405329,3
49,312611,1
49,336441,4
49,136494,5
49,345206,3
49,479179,1
49,318960,4
49,52683,3
49,270840,3
49,264828,1
49,222390,4
49,456614,5
49,436207,5
49,306308,2
49,391582,5
49,494200,4
49,423328,3
49,112997,3
49,229347,5
49,474928,3
49,349350,1
49,208508,3
49,314397,2
49,14673,2
49,496041,4
49,301875,4
49,234234,1
49,325287,3
49,35756,5
49,365097,4
49,13376,4
49,333634,2
49,283494,5
49,208617,3
49,245390,1
49,221804,2
49,347821,3
49,138954,5
49,164206,5
49,72238,1
49,356632,1
49,452296,3
49,182288,5
49,499031,5
49,150727,4
49,240533,5
49,326081,4
49,220683,2
49,196527,2
49,177165,3
49,411709,5
49,360722,3
49,466310,1
49,160375,2
49,137203,5
49,32634,4
49,62134,5
49,96982,5
49,196951,1
49,304155,5
49,406109,4
49,244276,5
49,189552,1
49,442215,3
49,268806,2
49,364912,2
49,410896,5
49,450602,5
49,151703,1
49,248872,4
49,21684,1
49,41196,1
49,26614,2
49,369075,5
49,321916,1
49,325081,1
49,329877,4
49,344661,4
49,8429,3
49,69279,1
49,143695,1
49,229120,2
49,26298,4
49,54456,1
49,75937,4
49,87042,3
49,345383,5
49,363683,4
49,128047,3
49,234878,5
49,428914,3
49,353107,2
49,266850,4
49,421211,3
49,265739,4
49,303723,1
49,244575,4
49,303625,4
49,350481,5
49,63985,4
49,207327,3
49,397535,1
49,300916,5
49,358094,4
49,314919,5
49,309355,5
49,403169,5
49,90148,4
49,224056,4
49,359181,2
49,341927,5
49,436521,4
49,480682,4
49,315561,3
49,218647,5
49,245276,2
49,93189,1
49,204695,4
49,498350,5
49,155787,3
49,112730,3
49,416756,2
49,411909,4
49,253353,2
49,196663,5
49,40903,3
49,51873,2
49,66925,3
 > Date: Thu, 20 Oct 2011 18:40:38 +0200
> From: ssc@apache.org
> To: user@mahout.apache.org
> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> 
> To put it simplified:
> 
> The vector of recommendations is the sum of the similarity vectors for
> all preferred items. In each similarity vector for a preferred item the
> entry for that particular item is set to NaN.
> 
> That means that in the recommendation vector the entries for all
> preferred items will be NaN.
> 
> It's a neat trick that is unfortunately very hard to see in the code.
> 
> --sebastian
> 
> On 20.10.2011 18:36, WangRamon wrote:
> > 
> > Hi Sebastian
> > "But as the entry for the item itself is set to NaN in its similarityvector and NaN plus something stays always NaN, the predicted preferencefor an item that was already preferred is NaN. And the NaN entries aredropped later."
> > Wait a minute here, i can understand NaN plus something stays always NaN, but, how do you explain "the predicted preference for an item that was already preferred is NaN", where do you put the code to check an item that was already preferred? The only thing about NaN in SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a similarity of NaN, am i right?
> > Thanks
> > Ramon
> >> Date: Thu, 20 Oct 2011 17:04:20 +0200
> >> From: ssc@apache.org
> >> To: user@mahout.apache.org
> >> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> >>
> >> On 20.10.2011 16:57, WangRamon wrote:
> >>>
> >>> Hi Sebastian and Sean 
> >>> Thanks for your help. 
> >>>
> >>> I re-read the code again (debug seems to be very difficult for me to setup the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
> >>>     /* remove self similarity */ 
> >>>     similarityMatrixRow.set(key.get(), Double.NaN); 
> >>> I think the meanning is to mark the similarity between Item X and Item X (the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer, it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for all items that similar to Item i (it could be Item X or some other item), then return the top 10 (default) for a user. 
> >>> During this process, i cannot see any code to exclude an item which the user has already given preference from recommendation. 
> >>
> >> It's a little bit hidden :) For each preferred item, a vector of all its
> >> similarities is added:
> >>
> >>       numerators = numerators == null
> >>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
> >> simColumn.times(prefValue)
> >>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
> >> : simColumn.times(prefValue));
> >>
> >> But as the entry for the item itself is set to NaN in its similarity
> >> vector and NaN plus something stays always NaN, the predicted preference
> >> for an item that was already preferred is NaN. And the NaN entries are
> >> dropped later.
> >>
> >> --sebastian
> >>
> >>
> >>> Correct me if i miss something, thank you guys. 
> >>> Cheers Ramon
> >>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
> >>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> >>>> From: srowen@gmail.com
> >>>> To: user@mahout.apache.org
> >>>>
> >>>> Ah OK, figured as much. WangRamon does that answer your question
> >>>> and/or can you debug to see if this is happening, not happening for
> >>>> you in your use case?
> >>>>
> >>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
> >>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
> >>>>> unit test that checks whether a user is only recommended unknown items
> >>>>> which still works.
> >>>  		 	   		  
> >>
> >  		 	   		  
> 
 		 	   		  

Re: Recommend result contains item which user has already given preference, is that correct?

Posted by Sebastian Schelter <ss...@apache.org>.
To put it simplified:

The vector of recommendations is the sum of the similarity vectors for
all preferred items. In each similarity vector for a preferred item the
entry for that particular item is set to NaN.

That means that in the recommendation vector the entries for all
preferred items will be NaN.

It's a neat trick that is unfortunately very hard to see in the code.

--sebastian

On 20.10.2011 18:36, WangRamon wrote:
> 
> Hi Sebastian
> "But as the entry for the item itself is set to NaN in its similarityvector and NaN plus something stays always NaN, the predicted preferencefor an item that was already preferred is NaN. And the NaN entries aredropped later."
> Wait a minute here, i can understand NaN plus something stays always NaN, but, how do you explain "the predicted preference for an item that was already preferred is NaN", where do you put the code to check an item that was already preferred? The only thing about NaN in SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a similarity of NaN, am i right?
> Thanks
> Ramon
>> Date: Thu, 20 Oct 2011 17:04:20 +0200
>> From: ssc@apache.org
>> To: user@mahout.apache.org
>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>>
>> On 20.10.2011 16:57, WangRamon wrote:
>>>
>>> Hi Sebastian and Sean 
>>> Thanks for your help. 
>>>
>>> I re-read the code again (debug seems to be very difficult for me to setup the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
>>>     /* remove self similarity */ 
>>>     similarityMatrixRow.set(key.get(), Double.NaN); 
>>> I think the meanning is to mark the similarity between Item X and Item X (the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer, it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for all items that similar to Item i (it could be Item X or some other item), then return the top 10 (default) for a user. 
>>> During this process, i cannot see any code to exclude an item which the user has already given preference from recommendation. 
>>
>> It's a little bit hidden :) For each preferred item, a vector of all its
>> similarities is added:
>>
>>       numerators = numerators == null
>>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
>> simColumn.times(prefValue)
>>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
>> : simColumn.times(prefValue));
>>
>> But as the entry for the item itself is set to NaN in its similarity
>> vector and NaN plus something stays always NaN, the predicted preference
>> for an item that was already preferred is NaN. And the NaN entries are
>> dropped later.
>>
>> --sebastian
>>
>>
>>> Correct me if i miss something, thank you guys. 
>>> Cheers Ramon
>>>> Date: Thu, 20 Oct 2011 13:59:28 +0100
>>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>>>> From: srowen@gmail.com
>>>> To: user@mahout.apache.org
>>>>
>>>> Ah OK, figured as much. WangRamon does that answer your question
>>>> and/or can you debug to see if this is happening, not happening for
>>>> you in your use case?
>>>>
>>>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
>>>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
>>>>> unit test that checks whether a user is only recommended unknown items
>>>>> which still works.
>>>  		 	   		  
>>
>  		 	   		  


RE: Recommend result contains item which user has already given preference, is that correct?

Posted by WangRamon <ra...@hotmail.com>.
Hi Sebastian
"But as the entry for the item itself is set to NaN in its similarityvector and NaN plus something stays always NaN, the predicted preferencefor an item that was already preferred is NaN. And the NaN entries aredropped later."
Wait a minute here, i can understand NaN plus something stays always NaN, but, how do you explain "the predicted preference for an item that was already preferred is NaN", where do you put the code to check an item that was already preferred? The only thing about NaN in SimilarityMatrixRowWrapperMapper is to say two item (A to A) has a similarity of NaN, am i right?
Thanks
Ramon
> Date: Thu, 20 Oct 2011 17:04:20 +0200
> From: ssc@apache.org
> To: user@mahout.apache.org
> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> 
> On 20.10.2011 16:57, WangRamon wrote:
> > 
> > Hi Sebastian and Sean 
> > Thanks for your help. 
> > 
> > I re-read the code again (debug seems to be very difficult for me to setup the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
> >     /* remove self similarity */ 
> >     similarityMatrixRow.set(key.get(), Double.NaN); 
> > I think the meanning is to mark the similarity between Item X and Item X (the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer, it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for all items that similar to Item i (it could be Item X or some other item), then return the top 10 (default) for a user. 
> > During this process, i cannot see any code to exclude an item which the user has already given preference from recommendation. 
> 
> It's a little bit hidden :) For each preferred item, a vector of all its
> similarities is added:
> 
>       numerators = numerators == null
>           ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
> simColumn.times(prefValue)
>           : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
> : simColumn.times(prefValue));
> 
> But as the entry for the item itself is set to NaN in its similarity
> vector and NaN plus something stays always NaN, the predicted preference
> for an item that was already preferred is NaN. And the NaN entries are
> dropped later.
> 
> --sebastian
> 
> 
> > Correct me if i miss something, thank you guys. 
> > Cheers Ramon
> >> Date: Thu, 20 Oct 2011 13:59:28 +0100
> >> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> >> From: srowen@gmail.com
> >> To: user@mahout.apache.org
> >>
> >> Ah OK, figured as much. WangRamon does that answer your question
> >> and/or can you debug to see if this is happening, not happening for
> >> you in your use case?
> >>
> >> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
> >>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
> >>> unit test that checks whether a user is only recommended unknown items
> >>> which still works.
> >  		 	   		  
> 
 		 	   		  

Re: Recommend result contains item which user has already given preference, is that correct?

Posted by Sebastian Schelter <ss...@apache.org>.
On 20.10.2011 16:57, WangRamon wrote:
> 
> Hi Sebastian and Sean 
> Thanks for your help. 
> 
> I re-read the code again (debug seems to be very difficult for me to setup the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
>     /* remove self similarity */ 
>     similarityMatrixRow.set(key.get(), Double.NaN); 
> I think the meanning is to mark the similarity between Item X and Item X (the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer, it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for all items that similar to Item i (it could be Item X or some other item), then return the top 10 (default) for a user. 
> During this process, i cannot see any code to exclude an item which the user has already given preference from recommendation. 

It's a little bit hidden :) For each preferred item, a vector of all its
similarities is added:

      numerators = numerators == null
          ? prefValue == BOOLEAN_PREF_VALUE ? simColumn.clone() :
simColumn.times(prefValue)
          : numerators.plus(prefValue == BOOLEAN_PREF_VALUE ? simColumn
: simColumn.times(prefValue));

But as the entry for the item itself is set to NaN in its similarity
vector and NaN plus something stays always NaN, the predicted preference
for an item that was already preferred is NaN. And the NaN entries are
dropped later.

--sebastian


> Correct me if i miss something, thank you guys. 
> Cheers Ramon
>> Date: Thu, 20 Oct 2011 13:59:28 +0100
>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>> From: srowen@gmail.com
>> To: user@mahout.apache.org
>>
>> Ah OK, figured as much. WangRamon does that answer your question
>> and/or can you debug to see if this is happening, not happening for
>> you in your use case?
>>
>> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
>>> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
>>> unit test that checks whether a user is only recommended unknown items
>>> which still works.
>  		 	   		  


RE: Recommend result contains item which user has already given preference, is that correct?

Posted by WangRamon <ra...@hotmail.com>.
Hi Sebastian and Sean 
Thanks for your help. 

I re-read the code again (debug seems to be very difficult for me to setup the environment) and find the line in SimilarityMatrixRowWrapperMapper,  i past it below with the comments: 
    /* remove self similarity */ 
    similarityMatrixRow.set(key.get(), Double.NaN); 
I think the meanning is to mark the similarity between Item X and Item X (the identical one) as NaN, but it doesn't exclude Item X from recommendation, then in AggregateAndRecommendReducer, it uses simColumn.times(prefValue) as part of the formula to calculate the preferences for all items that similar to Item i (it could be Item X or some other item), then return the top 10 (default) for a user. 
During this process, i cannot see any code to exclude an item which the user has already given preference from recommendation. 
Correct me if i miss something, thank you guys. 
Cheers Ramon
> Date: Thu, 20 Oct 2011 13:59:28 +0100
> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> Ah OK, figured as much. WangRamon does that answer your question
> and/or can you debug to see if this is happening, not happening for
> you in your use case?
> 
> On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
> > It's still included in SimilarityMatrixRowWrapperMapper. We also have a
> > unit test that checks whether a user is only recommended unknown items
> > which still works.
 		 	   		  

Re: Recommend result contains item which user has already given preference, is that correct?

Posted by Sean Owen <sr...@gmail.com>.
Ah OK, figured as much. WangRamon does that answer your question
and/or can you debug to see if this is happening, not happening for
you in your use case?

On Thu, Oct 20, 2011 at 1:42 PM, Sebastian Schelter <ss...@apache.org> wrote:
> It's still included in SimilarityMatrixRowWrapperMapper. We also have a
> unit test that checks whether a user is only recommended unknown items
> which still works.

Re: Recommend result contains item which user has already given preference, is that correct?

Posted by Sebastian Schelter <ss...@apache.org>.
It's still included in SimilarityMatrixRowWrapperMapper. We also have a
unit test that checks whether a user is only recommended unknown items
which still works.

--sebastian

On 20.10.2011 12:10, Sean Owen wrote:
> Hmm, Sebastian do you know where this went? In RecommenderJob, I only
> see ItemFilterAsVectorAndPrefsReducer doing this, but it only applies
> to the filters file. I don't see where the original input is also used
> to filter. Did this get lost or am I missing it?
> 
> 2011/10/20 WangRamon <ra...@hotmail.com>:
>>
>> Yes, I'm pretty sure about this. Is this NaN setting exist in Mahout 0.5? Or it only in Mahout 0.6? Could you please show me the line, thank you very much. Thanks Ramon
>>  > Date: Thu, 20 Oct 2011 10:44:01 +0100
>>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>>> From: srowen@gmail.com
>>> To: user@mahout.apache.org
>>>
>>> No, it shouldn't. Are you sure about this? You can see the stage where
>>> it excludes these items by setting their values to NaN, just before
>>> AggregateAndRecommend. Did you modify this stage or exclude it?
>>>
>>> 2011/10/20 WangRamon <ra...@hotmail.com>:
>>>>
>>>>
>>>>
>>>>
>>>> Hi Guys I finished running the RecommenderJob today on the two nodes cluster, finally. But what suprised me is that the final recommend output of the RecommenderJob contains item which user has already given preference, I'm not sure is that correct? If it was wrong how can I resolve this problem? Thanks a lot. Cheers Ramon
>>


Re: Recommend result contains item which user has already given preference, is that correct?

Posted by Sean Owen <sr...@gmail.com>.
Hmm, Sebastian do you know where this went? In RecommenderJob, I only
see ItemFilterAsVectorAndPrefsReducer doing this, but it only applies
to the filters file. I don't see where the original input is also used
to filter. Did this get lost or am I missing it?

2011/10/20 WangRamon <ra...@hotmail.com>:
>
> Yes, I'm pretty sure about this. Is this NaN setting exist in Mahout 0.5? Or it only in Mahout 0.6? Could you please show me the line, thank you very much. Thanks Ramon
>  > Date: Thu, 20 Oct 2011 10:44:01 +0100
>> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
>> From: srowen@gmail.com
>> To: user@mahout.apache.org
>>
>> No, it shouldn't. Are you sure about this? You can see the stage where
>> it excludes these items by setting their values to NaN, just before
>> AggregateAndRecommend. Did you modify this stage or exclude it?
>>
>> 2011/10/20 WangRamon <ra...@hotmail.com>:
>> >
>> >
>> >
>> >
>> > Hi Guys I finished running the RecommenderJob today on the two nodes cluster, finally. But what suprised me is that the final recommend output of the RecommenderJob contains item which user has already given preference, I'm not sure is that correct? If it was wrong how can I resolve this problem? Thanks a lot. Cheers Ramon
>

RE: Recommend result contains item which user has already given preference, is that correct?

Posted by WangRamon <ra...@hotmail.com>.
Yes, I'm pretty sure about this. Is this NaN setting exist in Mahout 0.5? Or it only in Mahout 0.6? Could you please show me the line, thank you very much. Thanks Ramon
 > Date: Thu, 20 Oct 2011 10:44:01 +0100
> Subject: Re: Recommend result contains item which user has already given preference, is that correct?
> From: srowen@gmail.com
> To: user@mahout.apache.org
> 
> No, it shouldn't. Are you sure about this? You can see the stage where
> it excludes these items by setting their values to NaN, just before
> AggregateAndRecommend. Did you modify this stage or exclude it?
> 
> 2011/10/20 WangRamon <ra...@hotmail.com>:
> >
> >
> >
> >
> > Hi Guys I finished running the RecommenderJob today on the two nodes cluster, finally. But what suprised me is that the final recommend output of the RecommenderJob contains item which user has already given preference, I'm not sure is that correct? If it was wrong how can I resolve this problem? Thanks a lot. Cheers Ramon
 		 	   		  

Re: Recommend result contains item which user has already given preference, is that correct?

Posted by Sean Owen <sr...@gmail.com>.
No, it shouldn't. Are you sure about this? You can see the stage where
it excludes these items by setting their values to NaN, just before
AggregateAndRecommend. Did you modify this stage or exclude it?

2011/10/20 WangRamon <ra...@hotmail.com>:
>
>
>
>
> Hi Guys I finished running the RecommenderJob today on the two nodes cluster, finally. But what suprised me is that the final recommend output of the RecommenderJob contains item which user has already given preference, I'm not sure is that correct? If it was wrong how can I resolve this problem? Thanks a lot. Cheers Ramon