You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Wei Li <we...@gmail.com> on 2014/08/25 08:22:08 UTC

RecommenderJob

Hi Mahout users:

    We have tried the item-based CF recommender with a user_id, item_id,
rating data. while the recommendation output is less than our expected, for
example, if we have 1000 users, the output should have 1000 records, one
for each user, right?

Best
Wei

Re: RecommenderJob

Posted by Wei Li <we...@gmail.com>.

Got your point, thanks Sharma and Peng for your replies.


On Mon, Aug 25, 2014 at 4:50 PM, Yash Sharma <ya...@gmail.com> wrote:

> Mahout collab filtering would remove the items you have already rated since
> you would not want to see the same items which you have already used and
> rated.
>
>
> On Mon, Aug 25, 2014 at 2:14 PM, Wei Li <we...@gmail.com> wrote:
>
> > OK, why not just output the items uses clicked or rated before? does it
> > output the these records if we provide the userFile option? thanks.
> >
> >
> > On Mon, Aug 25, 2014 at 4:41 PM, Peng Zhang <pz...@gmail.com>
> wrote:
> >
> > > If there are no suitable recommendations for a user, the output will
> not
> > > contain any records related to this user.
> > >
> > >
> > > Peng Zhang
> > >
> > >
> > > On Aug 25, 2014, at 4:38 PM, Wei Li <we...@gmail.com> wrote:
> > >
> > > > thanks Peng's answers. Yes, I know this case, but RecommenderJob does
> > not
> > > > output these records?
> > > >
> > > >
> > > > On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang <pz...@gmail.com>
> > > wrote:
> > > >
> > > >> If an item is not similar to anyone else, and a user only connects
> > with
> > > >> this item, this user doesnt get any recommended items.
> > > >>
> > > >> This is just one example.
> > > >>
> > > >> Peng Zhang
> > > >>
> > > >> --
> > > >> Sent from my iPhone
> > > >>
> > > >>> On Aug 25, 2014, at 2:22 PM, Wei Li <we...@gmail.com> wrote:
> > > >>>
> > > >>> Hi Mahout users:
> > > >>>
> > > >>>   We have tried the item-based CF recommender with a user_id,
> > item_id,
> > > >>> rating data. while the recommendation output is less than our
> > expected,
> > > >> for
> > > >>> example, if we have 1000 users, the output should have 1000
> records,
> > > one
> > > >>> for each user, right?
> > > >>>
> > > >>> Best
> > > >>> Wei
> > > >>
> > >
> > >
> >
>

Re: RecommenderJob

Posted by Yash Sharma <ya...@gmail.com>.

Mahout collab filtering would remove the items you have already rated since
you would not want to see the same items which you have already used and
rated.


On Mon, Aug 25, 2014 at 2:14 PM, Wei Li <we...@gmail.com> wrote:

> OK, why not just output the items uses clicked or rated before? does it
> output the these records if we provide the userFile option? thanks.
>
>
> On Mon, Aug 25, 2014 at 4:41 PM, Peng Zhang <pz...@gmail.com> wrote:
>
> > If there are no suitable recommendations for a user, the output will not
> > contain any records related to this user.
> >
> >
> > Peng Zhang
> >
> >
> > On Aug 25, 2014, at 4:38 PM, Wei Li <we...@gmail.com> wrote:
> >
> > > thanks Peng's answers. Yes, I know this case, but RecommenderJob does
> not
> > > output these records?
> > >
> > >
> > > On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang <pz...@gmail.com>
> > wrote:
> > >
> > >> If an item is not similar to anyone else, and a user only connects
> with
> > >> this item, this user doesnt get any recommended items.
> > >>
> > >> This is just one example.
> > >>
> > >> Peng Zhang
> > >>
> > >> --
> > >> Sent from my iPhone
> > >>
> > >>> On Aug 25, 2014, at 2:22 PM, Wei Li <we...@gmail.com> wrote:
> > >>>
> > >>> Hi Mahout users:
> > >>>
> > >>>   We have tried the item-based CF recommender with a user_id,
> item_id,
> > >>> rating data. while the recommendation output is less than our
> expected,
> > >> for
> > >>> example, if we have 1000 users, the output should have 1000 records,
> > one
> > >>> for each user, right?
> > >>>
> > >>> Best
> > >>> Wei
> > >>
> >
> >
>

Re: RecommenderJob

Posted by Wei Li <we...@gmail.com>.

OK, why not just output the items uses clicked or rated before? does it
output the these records if we provide the userFile option? thanks.


On Mon, Aug 25, 2014 at 4:41 PM, Peng Zhang <pz...@gmail.com> wrote:

> If there are no suitable recommendations for a user, the output will not
> contain any records related to this user.
>
>
> Peng Zhang
>
>
> On Aug 25, 2014, at 4:38 PM, Wei Li <we...@gmail.com> wrote:
>
> > thanks Peng's answers. Yes, I know this case, but RecommenderJob does not
> > output these records?
> >
> >
> > On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang <pz...@gmail.com>
> wrote:
> >
> >> If an item is not similar to anyone else, and a user only connects with
> >> this item, this user doesnt get any recommended items.
> >>
> >> This is just one example.
> >>
> >> Peng Zhang
> >>
> >> --
> >> Sent from my iPhone
> >>
> >>> On Aug 25, 2014, at 2:22 PM, Wei Li <we...@gmail.com> wrote:
> >>>
> >>> Hi Mahout users:
> >>>
> >>>   We have tried the item-based CF recommender with a user_id, item_id,
> >>> rating data. while the recommendation output is less than our expected,
> >> for
> >>> example, if we have 1000 users, the output should have 1000 records,
> one
> >>> for each user, right?
> >>>
> >>> Best
> >>> Wei
> >>
>
>

Re: RecommenderJob

Posted by Wei Li <we...@gmail.com>.

Thanks for your useful suggestions, Ferrel. 1000 users is just an example,
actually we have 10 million users in total, we will try the method your
mentioned and get feedback to you, thanks. :)


On Tue, Aug 26, 2014 at 12:05 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I always use SIMILARITY_LOGLIKELIHOOD. LLR almost always works best for
> places that call for “similarity” or “distance”.
>
> 1000 people isn’t very many. How many items? Look in the data and count
> unique number of users * number of items, this tell you the cardinality of
> your data. The number of interactions will tell you how sparse the data is.
> So it you have 1000 items you have a 1000 by 1000 input matrix, most of
> which will be empty and therefore “sparse”. But if there aren’t enough
> interactions or non blank spots in the matrix you will not have enough data
> to return recs for every user.
>
> Collaborative filtering works well if you have long lived items and enough
> users interacting with them. To get a handle on whether your data supports
> CF ask youself: How many interactions? Every unique input (userID, itemID)
> is an interaction. How many people interacted with each item?  How many
> people total?  How many people interacted with more than one item?
>
> Another way is to run the hadoop version of the recommender (using LLR)
> and see how many people get recommendations. LLR uses the above mentioned
> metrics in calculating recs so the number of people that get recs is an
> indirect way of telling how dense your data is.
>
> On Aug 25, 2014, at 1:51 AM, Wei Li <we...@gmail.com> wrote:
>
> Thanks Sharma, does all similarity measures have this problem or only some
> specific similarity measures have?
>
>
> On Mon, Aug 25, 2014 at 4:48 PM, Yash Sharma <ya...@gmail.com> wrote:
>
> > Pearson Coefficient Similarity does not go very well with small datasets
> > with less similarities - and removes those from output. Since you are
> using
> > co-occurrence similarity this is not the case.
> >
> >
> > On Mon, Aug 25, 2014 at 2:11 PM, Peng Zhang <pz...@gmail.com>
> wrote:
> >
> >> If there are no suitable recommendations for a user, the output will not
> >> contain any records related to this user.
> >>
> >>
> >> Peng Zhang
> >>
> >>
> >> On Aug 25, 2014, at 4:38 PM, Wei Li <we...@gmail.com> wrote:
> >>
> >>> thanks Peng's answers. Yes, I know this case, but RecommenderJob does
> > not
> >>> output these records?
> >>>
> >>>
> >>> On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang <pz...@gmail.com>
> >> wrote:
> >>>
> >>>> If an item is not similar to anyone else, and a user only connects
> > with
> >>>> this item, this user doesnt get any recommended items.
> >>>>
> >>>> This is just one example.
> >>>>
> >>>> Peng Zhang
> >>>>
> >>>> --
> >>>> Sent from my iPhone
> >>>>
> >>>>> On Aug 25, 2014, at 2:22 PM, Wei Li <we...@gmail.com> wrote:
> >>>>>
> >>>>> Hi Mahout users:
> >>>>>
> >>>>>  We have tried the item-based CF recommender with a user_id,
> > item_id,
> >>>>> rating data. while the recommendation output is less than our
> > expected,
> >>>> for
> >>>>> example, if we have 1000 users, the output should have 1000 records,
> >> one
> >>>>> for each user, right?
> >>>>>
> >>>>> Best
> >>>>> Wei
> >>>>
> >>
> >>
> >
>
>

Re: RecommenderJob

Posted by Pat Ferrel <pa...@occamsmachete.com>.

I always use SIMILARITY_LOGLIKELIHOOD. LLR almost always works best for places that call for “similarity” or “distance”.

1000 people isn’t very many. How many items? Look in the data and count unique number of users * number of items, this tell you the cardinality of your data. The number of interactions will tell you how sparse the data is. So it you have 1000 items you have a 1000 by 1000 input matrix, most of which will be empty and therefore “sparse”. But if there aren’t enough interactions or non blank spots in the matrix you will not have enough data to return recs for every user. 

Collaborative filtering works well if you have long lived items and enough users interacting with them. To get a handle on whether your data supports CF ask youself: How many interactions? Every unique input (userID, itemID) is an interaction. How many people interacted with each item?  How many people total?  How many people interacted with more than one item?

Another way is to run the hadoop version of the recommender (using LLR) and see how many people get recommendations. LLR uses the above mentioned metrics in calculating recs so the number of people that get recs is an indirect way of telling how dense your data is. 

On Aug 25, 2014, at 1:51 AM, Wei Li <we...@gmail.com> wrote:

Thanks Sharma, does all similarity measures have this problem or only some
specific similarity measures have?

On Mon, Aug 25, 2014 at 4:48 PM, Yash Sharma <ya...@gmail.com> wrote:

> Pearson Coefficient Similarity does not go very well with small datasets
> with less similarities - and removes those from output. Since you are using
> co-occurrence similarity this is not the case.
> 
> 
> On Mon, Aug 25, 2014 at 2:11 PM, Peng Zhang <pz...@gmail.com> wrote:
> 
>> If there are no suitable recommendations for a user, the output will not
>> contain any records related to this user.
>> 
>> 
>> Peng Zhang
>> 
>> 
>> On Aug 25, 2014, at 4:38 PM, Wei Li <we...@gmail.com> wrote:
>> 
>>> thanks Peng's answers. Yes, I know this case, but RecommenderJob does
> not
>>> output these records?
>>> 
>>> 
>>> On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang <pz...@gmail.com>
>> wrote:
>>> 
>>>> If an item is not similar to anyone else, and a user only connects
> with
>>>> this item, this user doesnt get any recommended items.
>>>> 
>>>> This is just one example.
>>>> 
>>>> Peng Zhang
>>>> 
>>>> --
>>>> Sent from my iPhone
>>>> 
>>>>> On Aug 25, 2014, at 2:22 PM, Wei Li <we...@gmail.com> wrote:
>>>>> 
>>>>> Hi Mahout users:
>>>>> 
>>>>>  We have tried the item-based CF recommender with a user_id,
> item_id,
>>>>> rating data. while the recommendation output is less than our
> expected,
>>>> for
>>>>> example, if we have 1000 users, the output should have 1000 records,
>> one
>>>>> for each user, right?
>>>>> 
>>>>> Best
>>>>> Wei
>>>> 
>> 
>> 
>

Re: RecommenderJob

Posted by Wei Li <we...@gmail.com>.

Thanks Sharma, does all similarity measures have this problem or only some
specific similarity measures have?


On Mon, Aug 25, 2014 at 4:48 PM, Yash Sharma <ya...@gmail.com> wrote:

> Pearson Coefficient Similarity does not go very well with small datasets
> with less similarities - and removes those from output. Since you are using
> co-occurrence similarity this is not the case.
>
>
> On Mon, Aug 25, 2014 at 2:11 PM, Peng Zhang <pz...@gmail.com> wrote:
>
> > If there are no suitable recommendations for a user, the output will not
> > contain any records related to this user.
> >
> >
> > Peng Zhang
> >
> >
> > On Aug 25, 2014, at 4:38 PM, Wei Li <we...@gmail.com> wrote:
> >
> > > thanks Peng's answers. Yes, I know this case, but RecommenderJob does
> not
> > > output these records?
> > >
> > >
> > > On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang <pz...@gmail.com>
> > wrote:
> > >
> > >> If an item is not similar to anyone else, and a user only connects
> with
> > >> this item, this user doesnt get any recommended items.
> > >>
> > >> This is just one example.
> > >>
> > >> Peng Zhang
> > >>
> > >> --
> > >> Sent from my iPhone
> > >>
> > >>> On Aug 25, 2014, at 2:22 PM, Wei Li <we...@gmail.com> wrote:
> > >>>
> > >>> Hi Mahout users:
> > >>>
> > >>>   We have tried the item-based CF recommender with a user_id,
> item_id,
> > >>> rating data. while the recommendation output is less than our
> expected,
> > >> for
> > >>> example, if we have 1000 users, the output should have 1000 records,
> > one
> > >>> for each user, right?
> > >>>
> > >>> Best
> > >>> Wei
> > >>
> >
> >
>

Re: RecommenderJob

Posted by Yash Sharma <ya...@gmail.com>.

Pearson Coefficient Similarity does not go very well with small datasets
with less similarities - and removes those from output. Since you are using
co-occurrence similarity this is not the case.


On Mon, Aug 25, 2014 at 2:11 PM, Peng Zhang <pz...@gmail.com> wrote:

> If there are no suitable recommendations for a user, the output will not
> contain any records related to this user.
>
>
> Peng Zhang
>
>
> On Aug 25, 2014, at 4:38 PM, Wei Li <we...@gmail.com> wrote:
>
> > thanks Peng's answers. Yes, I know this case, but RecommenderJob does not
> > output these records?
> >
> >
> > On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang <pz...@gmail.com>
> wrote:
> >
> >> If an item is not similar to anyone else, and a user only connects with
> >> this item, this user doesnt get any recommended items.
> >>
> >> This is just one example.
> >>
> >> Peng Zhang
> >>
> >> --
> >> Sent from my iPhone
> >>
> >>> On Aug 25, 2014, at 2:22 PM, Wei Li <we...@gmail.com> wrote:
> >>>
> >>> Hi Mahout users:
> >>>
> >>>   We have tried the item-based CF recommender with a user_id, item_id,
> >>> rating data. while the recommendation output is less than our expected,
> >> for
> >>> example, if we have 1000 users, the output should have 1000 records,
> one
> >>> for each user, right?
> >>>
> >>> Best
> >>> Wei
> >>
>
>

Re: RecommenderJob

Posted by Peng Zhang <pz...@gmail.com>.

If there are no suitable recommendations for a user, the output will not contain any records related to this user.


Peng Zhang


On Aug 25, 2014, at 4:38 PM, Wei Li <we...@gmail.com> wrote:

> thanks Peng's answers. Yes, I know this case, but RecommenderJob does not
> output these records?
> 
> 
> On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang <pz...@gmail.com> wrote:
> 
>> If an item is not similar to anyone else, and a user only connects with
>> this item, this user doesnt get any recommended items.
>> 
>> This is just one example.
>> 
>> Peng Zhang
>> 
>> --
>> Sent from my iPhone
>> 
>>> On Aug 25, 2014, at 2:22 PM, Wei Li <we...@gmail.com> wrote:
>>> 
>>> Hi Mahout users:
>>> 
>>>   We have tried the item-based CF recommender with a user_id, item_id,
>>> rating data. while the recommendation output is less than our expected,
>> for
>>> example, if we have 1000 users, the output should have 1000 records, one
>>> for each user, right?
>>> 
>>> Best
>>> Wei
>>

Re: RecommenderJob

Posted by Wei Li <we...@gmail.com>.

thanks Peng's answers. Yes, I know this case, but RecommenderJob does not
output these records?


On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang <pz...@gmail.com> wrote:

> If an item is not similar to anyone else, and a user only connects with
> this item, this user doesnt get any recommended items.
>
> This is just one example.
>
> Peng Zhang
>
> --
> Sent from my iPhone
>
> > On Aug 25, 2014, at 2:22 PM, Wei Li <we...@gmail.com> wrote:
> >
> > Hi Mahout users:
> >
> >    We have tried the item-based CF recommender with a user_id, item_id,
> > rating data. while the recommendation output is less than our expected,
> for
> > example, if we have 1000 users, the output should have 1000 records, one
> > for each user, right?
> >
> > Best
> > Wei
>

Re: RecommenderJob

Posted by Peng Zhang <pz...@gmail.com>.

If an item is not similar to anyone else, and a user only connects with this item, this user doesnt get any recommended items. 

This is just one example. 

Peng Zhang

--
Sent from my iPhone

> On Aug 25, 2014, at 2:22 PM, Wei Li <we...@gmail.com> wrote:
> 
> Hi Mahout users:
> 
>    We have tried the item-based CF recommender with a user_id, item_id,
> rating data. while the recommendation output is less than our expected, for
> example, if we have 1000 users, the output should have 1000 records, one
> for each user, right?
> 
> Best
> Wei

Re: RecommenderJob

Posted by Wei Li <we...@gmail.com>.

Hi Sharma, we have used the SIMILARITY_COOCCURRENCE measure, does the
similarity method also affect the output results?




On Mon, Aug 25, 2014 at 3:40 PM, Yash Sharma <ya...@gmail.com> wrote:

> Hi Wei, Which similarity class are you using for the same?
>
>
> On Mon, Aug 25, 2014 at 11:52 AM, Wei Li <we...@gmail.com> wrote:
>
> > Hi Mahout users:
> >
> >     We have tried the item-based CF recommender with a user_id, item_id,
> > rating data. while the recommendation output is less than our expected,
> for
> > example, if we have 1000 users, the output should have 1000 records, one
> > for each user, right?
> >
> > Best
> > Wei
> >
>

Re: RecommenderJob

Posted by Yash Sharma <ya...@gmail.com>.

Hi Wei, Which similarity class are you using for the same?


On Mon, Aug 25, 2014 at 11:52 AM, Wei Li <we...@gmail.com> wrote:

> Hi Mahout users:
>
>     We have tried the item-based CF recommender with a user_id, item_id,
> rating data. while the recommendation output is less than our expected, for
> example, if we have 1000 users, the output should have 1000 records, one
> for each user, right?
>
> Best
> Wei
>