You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Niklas Ekvall <ni...@gmail.com> on 2015/11/24 11:16:03 UTC

Mahout - Recommenditemvalue with magnitude of 1

Hello Mahout Users!

I use today Mahout - Recommenditembased with Log-similarity to produce
personal recommendations for Trigger Eamils in a offline mode. But when I
produce e.g. 50 recommendations the rank value of the recommendations are
always of magnitude 1. Why is this so? And, is the first recommendations in
this list the best one or is there some randomness in this list?

Best regards,

Niklas Ekvall

Re: Mahout - Recommenditemvalue with magnitude of 1

Posted by Pat Ferrel <pa...@occamsmachete.com>.

With the older Hadoop-based recommenders you can use ratings or binary data. As Ted says binary data is pretty much always better. Your error is in treating any rating as a preference. A rating of 1 is unlikely to indicate a preference. Also you may have unresolved problems in your user and item ids. More detail below.

Long years of applying ratings to recommendations has not advanced the science and optimizing the error in predicting ratings is *fundamentally wrong*. Who cares about predicting ratings? The most important thing to be done is to *rank* recommendations since you will have limited ability to show recommendations and rating predictions does this very poorly. The user intent captured by rating is also ambiguous and may be only weakly related to what you want the user to do. Another problem with ratings is their fundamental ambiguity. Does 3 of 5 mean a user preference? It might for some people and not for others. In fact it might for one session of a single user and not for another session of the same user. Fundamental ambiguity does not exist with many implicit indicators like “purchase”, “watch-95%”, and other explicit indicators like “like”. These are all binary indicators. 

I would never suggest a person use the recommenditembased in Mahout. It has many limitations. It recommends for all users so you have to run it often to capture new items or new user actions in recommendations. New methods that use a search engine server will recommend for a particular user using realtime data. This method also requires that the model only be partially calculated, relying on the search engine for the last bit of the calculation so it is far more efficient with computation time. Recommending all recs for all users is quite wasteful and is all but impossible to keep up to date with the latest user actions.

All that said you may have missed my point about encoding ids for the old Hadoop-based Mahout. All user-ids need to start with 0 and increase to the number of users - 1, so (0..(users.length - 1)) The same is true of items too. The ids must be ordinal with no gaps. If your data does not follow these rules it will give highly unreliable results if any at all.

Does the old Hadoop-based recommender work better with ratings, no. The real question is are ratings better data than boolean and the answer is no, just the opposite. But treating ratings as booleans may not be good either since it implies that all ratings represent “likes”, which is surely not true. If you have no binary preference indicator you can’t turn ratings into binary without some other threshold based approach. I have done this before by using ratings of 4 and 5 as a binary “like” throwing the rest away as either non-preference or ambiguous preferences. This is simplistic but better than using all ratings as a preference. We’d have to look at how the ratings were captured to make a better threshold and the effort might better be put into capturing an unambiguous preference indicator.

The newer Mahout Correlation Engine implemented using Spark and available as the spark-itemsimilarity job calculates a model that must be used with a search engine. A full end-to-end recommender that implements streaming input of events, building models with the Mahout Correlation Engine, and serving of recommendations is here: http://templates.prediction.io/PredictionIO/template-scala-parallel-universal-recommendation Mahout’s spark-itemsimilarity and the search engine based recommender makes no restrictions on how you represent IDs since they both ingest them as strings.

On Nov 29, 2015, at 2:36 AM, Niklas Ekvall <ni...@gmail.com> wrote:

Hello again Pat!

I did find a testcase that I was able to recreate:

1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0

bin/mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i
/path/to/input/file -o /path/to/output/folder/ —numRecommendations 1

Output:

1 [104:2.8088317]
2 [105:3.5743618]
3 [103:4.336442]
4 [105:3.6903737]
5 [107:3.663558]

But when I change the ratings above to ones I got only ones in the
output-file for the recommendations values:

1 [104:1.0]
2 [105:1.0]
3 [103:1.0]
4 [105:1.0]
5 [107:1.0]

My conclusion is that recommenditembased in Mahout works better for ratings
than binary data, what is your conclusions?

Best, Niklas

2015-11-24 21:56 GMT+01:00 Pat Ferrel <pa...@occamsmachete.com>:

> 
> 
>> On Nov 24, 2015, at 12:21 PM, Niklas Ekvall <ni...@gmail.com>
> wrote:
>> 
>> Okay!
>> 
>> No pre-filter and the user/item ids should start from 0 and go as many
> user
>> and items there are. So, all the data we have should go into Mahout and
> we
>> filter inside Mahout....correct?
> 
> Yes, but I wouldn't filter. The recs will very likely be better than
> random with only a small number of events.
> 
>> 
>> We do the same pre-filter for Spark item-similarity, is that wrong to?
> 
> No, spark-itemsimilarity uses string ids.
> 
>> 
>> Best regards, Niklas
>> 
>> On Tuesday, November 24, 2015, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> 
>>> I wouldn’t pre-filter but in any case the ids input to hadoop-mahout
> need
>>> to follow those rules.
>>> 
>>> The new recommender I mentioned has no such requirements, it uses string
>>> IDs.
>>> 
>>> On Nov 24, 2015, at 11:44 AM, Niklas Ekvall <niklas.ekvall@gmail.com
>>> <javascript:;>> wrote:
>>> 
>>> No, it does not start from 0 and does not cover all number between 0 and
>>> the number of items/users. We do a prefiltering before (a user must have
>>> bought at lest 5 product and a product must have been  bought by 3
> users)
>>> we use Mahout on the dataset. Therefore we start with user 3, then it
> jumps
>>> to user 5, etc.
>>> 
>>> Is this wrong? Should we use all data as input to Mahout and do the
>>> filtring inside Mahout?
>>> 
>>> We use the second latest version of Mahout!
>>> 
>>> Best regards, Niklas
>>> 
>>> On Tuesday, November 24, 2015, Pat Ferrel <pat@occamsmachete.com
>>> <javascript:;>
>>> <javascript:_e(%7B%7D,'cvml','pat@occamsmachete.com <javascript:;>');>>
>>> wrote:
>>> 
>>>> Do your ids start with 0 and cover all numbers between 0 and the number
>>> of
>>>> items -1 (same for user ids)?
>>>> The old hadoop-mahout code required ordinal ids starting at 0
>>>> 
>>>> 
>>>> On Nov 24, 2015, at 8:19 AM, Niklas Ekvall <niklas.ekvall@gmail.com
>>> <javascript:;>>
>>>> wrote:
>>>> 
>>>> Hi Pat,
>>>> 
>>>> Here is some input:
>>>> 
>>>> 3       7414
>>>> 3       12682
>>>> 3       18947
>>>> 3       19980
>>>> 3       26975
>>>> 3       54635
>>>> 3       67789
>>>> 3       73212
>>>> 3       118932
>>>> 3       138846
>>>> 3       141268
>>>> 5       3
>>>> 5       2123
>>>> 5       37955
>>>> 5       39975
>>>> 5       113289
>>>> 6       3
>>>> 6       456
>>>> 6       2188
>>>> 6       2496
>>>> 6       6194
>>>> 6       6361
>>>> 6       6768
>>>> 6       6919
>>>> 6       6920
>>>> 6       7257
>>>> 6       7705
>>>> 6       7706
>>>> 6       11788
>>>> 
>>>> And some output:
>>>> 
>>>> 3
>>>> 
>>>> 
>>> 
> [122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0]
>>>> 5
>>>> 
>>>> 
>>> 
> [32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0]
>>>> 6
>>>> 
>>>> 
>>> 
> [40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0]
>>>> 
>>>> Best regards, Niklas
>>>> 
>>>> 2015-11-24 16:48 GMT+01:00 Pat Ferrel <pat@occamsmachete.com
>>> <javascript:;>>:
>>>> 
>>>>> Sounds like you may not have the input right. Recommendations should
> be
>>>>> sorted by the strength and so shouldn’t all be 1 unless the data is
> very
>>>>> odd.
>>>>> 
>>>>> Can you give us a small sample of the input?
>>>>> 
>>>>> 
>>>>> BTW a newer recommender using Mahout’s Spark based code and a search
>>>>> engine is here:
>>>>> 
>>>> 
>>> 
> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
>>>>> a single machine install script is here:
>>>> https://docs.prediction.io/start/
>>>>> 
>>>>> On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <niklas.ekvall@gmail.com
>>> <javascript:;>>
>>>>> wrote:
>>>>> 
>>>>> Hello Mahout Users!
>>>>> 
>>>>> I use today Mahout - Recommenditembased with Log-similarity to produce
>>>>> personal recommendations for Trigger Eamils in a offline mode. But
> when
>>> I
>>>>> produce e.g. 50 recommendations the rank value of the recommendations
>>> are
>>>>> always of magnitude 1. Why is this so? And, is the first
> recommendations
>>>> in
>>>>> this list the best one or is there some randomness in this list?
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> Niklas Ekvall
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>

Re: Mahout - Recommenditemvalue with magnitude of 1

Posted by Ted Dunning <te...@gmail.com>.

On Sun, Nov 29, 2015 at 9:36 PM, Niklas Ekvall <ni...@gmail.com>
wrote:

> My conclusion is that recommenditembased in Mahout works better for ratings
> than binary data, what is your conclusions?
>

Still operator error somewhere.  Binary data works much better as a real
recommender.

Re: Mahout - Recommenditemvalue with magnitude of 1

Posted by Niklas Ekvall <ni...@gmail.com>.

Hello again Pat!

I did find a testcase that I was able to recreate:

1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0

bin/mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i
/path/to/input/file -o /path/to/output/folder/ —numRecommendations 1

Output:

1 [104:2.8088317]
2 [105:3.5743618]
3 [103:4.336442]
4 [105:3.6903737]
5 [107:3.663558]

But when I change the ratings above to ones I got only ones in the
output-file for the recommendations values:

1 [104:1.0]
2 [105:1.0]
3 [103:1.0]
4 [105:1.0]
5 [107:1.0]

My conclusion is that recommenditembased in Mahout works better for ratings
than binary data, what is your conclusions?

Best, Niklas

2015-11-24 21:56 GMT+01:00 Pat Ferrel <pa...@occamsmachete.com>:

>
>
> > On Nov 24, 2015, at 12:21 PM, Niklas Ekvall <ni...@gmail.com>
> wrote:
> >
> > Okay!
> >
> > No pre-filter and the user/item ids should start from 0 and go as many
> user
> > and items there are. So, all the data we have should go into Mahout and
> we
> > filter inside Mahout....correct?
>
> Yes, but I wouldn't filter. The recs will very likely be better than
> random with only a small number of events.
>
> >
> > We do the same pre-filter for Spark item-similarity, is that wrong to?
>
> No, spark-itemsimilarity uses string ids.
>
> >
> > Best regards, Niklas
> >
> > On Tuesday, November 24, 2015, Pat Ferrel <pa...@occamsmachete.com> wrote:
> >
> >> I wouldn’t pre-filter but in any case the ids input to hadoop-mahout
> need
> >> to follow those rules.
> >>
> >> The new recommender I mentioned has no such requirements, it uses string
> >> IDs.
> >>
> >> On Nov 24, 2015, at 11:44 AM, Niklas Ekvall <niklas.ekvall@gmail.com
> >> <javascript:;>> wrote:
> >>
> >> No, it does not start from 0 and does not cover all number between 0 and
> >> the number of items/users. We do a prefiltering before (a user must have
> >> bought at lest 5 product and a product must have been  bought by 3
> users)
> >> we use Mahout on the dataset. Therefore we start with user 3, then it
> jumps
> >> to user 5, etc.
> >>
> >> Is this wrong? Should we use all data as input to Mahout and do the
> >> filtring inside Mahout?
> >>
> >> We use the second latest version of Mahout!
> >>
> >> Best regards, Niklas
> >>
> >> On Tuesday, November 24, 2015, Pat Ferrel <pat@occamsmachete.com
> >> <javascript:;>
> >> <javascript:_e(%7B%7D,'cvml','pat@occamsmachete.com <javascript:;>');>>
> >> wrote:
> >>
> >>> Do your ids start with 0 and cover all numbers between 0 and the number
> >> of
> >>> items -1 (same for user ids)?
> >>> The old hadoop-mahout code required ordinal ids starting at 0
> >>>
> >>>
> >>> On Nov 24, 2015, at 8:19 AM, Niklas Ekvall <niklas.ekvall@gmail.com
> >> <javascript:;>>
> >>> wrote:
> >>>
> >>> Hi Pat,
> >>>
> >>> Here is some input:
> >>>
> >>> 3       7414
> >>> 3       12682
> >>> 3       18947
> >>> 3       19980
> >>> 3       26975
> >>> 3       54635
> >>> 3       67789
> >>> 3       73212
> >>> 3       118932
> >>> 3       138846
> >>> 3       141268
> >>> 5       3
> >>> 5       2123
> >>> 5       37955
> >>> 5       39975
> >>> 5       113289
> >>> 6       3
> >>> 6       456
> >>> 6       2188
> >>> 6       2496
> >>> 6       6194
> >>> 6       6361
> >>> 6       6768
> >>> 6       6919
> >>> 6       6920
> >>> 6       7257
> >>> 6       7705
> >>> 6       7706
> >>> 6       11788
> >>>
> >>> And some output:
> >>>
> >>> 3
> >>>
> >>>
> >>
> [122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0]
> >>> 5
> >>>
> >>>
> >>
> [32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0]
> >>> 6
> >>>
> >>>
> >>
> [40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0]
> >>>
> >>> Best regards, Niklas
> >>>
> >>> 2015-11-24 16:48 GMT+01:00 Pat Ferrel <pat@occamsmachete.com
> >> <javascript:;>>:
> >>>
> >>>> Sounds like you may not have the input right. Recommendations should
> be
> >>>> sorted by the strength and so shouldn’t all be 1 unless the data is
> very
> >>>> odd.
> >>>>
> >>>> Can you give us a small sample of the input?
> >>>>
> >>>>
> >>>> BTW a newer recommender using Mahout’s Spark based code and a search
> >>>> engine is here:
> >>>>
> >>>
> >>
> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
> >>>> a single machine install script is here:
> >>> https://docs.prediction.io/start/
> >>>>
> >>>> On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <niklas.ekvall@gmail.com
> >> <javascript:;>>
> >>>> wrote:
> >>>>
> >>>> Hello Mahout Users!
> >>>>
> >>>> I use today Mahout - Recommenditembased with Log-similarity to produce
> >>>> personal recommendations for Trigger Eamils in a offline mode. But
> when
> >> I
> >>>> produce e.g. 50 recommendations the rank value of the recommendations
> >> are
> >>>> always of magnitude 1. Why is this so? And, is the first
> recommendations
> >>> in
> >>>> this list the best one or is there some randomness in this list?
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Niklas Ekvall
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
>

Re: Mahout - Recommenditemvalue with magnitude of 1

Posted by Pat Ferrel <pa...@occamsmachete.com>.


> On Nov 24, 2015, at 12:21 PM, Niklas Ekvall <ni...@gmail.com> wrote:
> 
> Okay!
> 
> No pre-filter and the user/item ids should start from 0 and go as many user
> and items there are. So, all the data we have should go into Mahout and we
> filter inside Mahout....correct?

Yes, but I wouldn't filter. The recs will very likely be better than random with only a small number of events.

> 
> We do the same pre-filter for Spark item-similarity, is that wrong to?

No, spark-itemsimilarity uses string ids.

> 
> Best regards, Niklas
> 
> On Tuesday, November 24, 2015, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> I wouldn’t pre-filter but in any case the ids input to hadoop-mahout need
>> to follow those rules.
>> 
>> The new recommender I mentioned has no such requirements, it uses string
>> IDs.
>> 
>> On Nov 24, 2015, at 11:44 AM, Niklas Ekvall <niklas.ekvall@gmail.com
>> <javascript:;>> wrote:
>> 
>> No, it does not start from 0 and does not cover all number between 0 and
>> the number of items/users. We do a prefiltering before (a user must have
>> bought at lest 5 product and a product must have been  bought by 3 users)
>> we use Mahout on the dataset. Therefore we start with user 3, then it jumps
>> to user 5, etc.
>> 
>> Is this wrong? Should we use all data as input to Mahout and do the
>> filtring inside Mahout?
>> 
>> We use the second latest version of Mahout!
>> 
>> Best regards, Niklas
>> 
>> On Tuesday, November 24, 2015, Pat Ferrel <pat@occamsmachete.com
>> <javascript:;>
>> <javascript:_e(%7B%7D,'cvml','pat@occamsmachete.com <javascript:;>');>>
>> wrote:
>> 
>>> Do your ids start with 0 and cover all numbers between 0 and the number
>> of
>>> items -1 (same for user ids)?
>>> The old hadoop-mahout code required ordinal ids starting at 0
>>> 
>>> 
>>> On Nov 24, 2015, at 8:19 AM, Niklas Ekvall <niklas.ekvall@gmail.com
>> <javascript:;>>
>>> wrote:
>>> 
>>> Hi Pat,
>>> 
>>> Here is some input:
>>> 
>>> 3       7414
>>> 3       12682
>>> 3       18947
>>> 3       19980
>>> 3       26975
>>> 3       54635
>>> 3       67789
>>> 3       73212
>>> 3       118932
>>> 3       138846
>>> 3       141268
>>> 5       3
>>> 5       2123
>>> 5       37955
>>> 5       39975
>>> 5       113289
>>> 6       3
>>> 6       456
>>> 6       2188
>>> 6       2496
>>> 6       6194
>>> 6       6361
>>> 6       6768
>>> 6       6919
>>> 6       6920
>>> 6       7257
>>> 6       7705
>>> 6       7706
>>> 6       11788
>>> 
>>> And some output:
>>> 
>>> 3
>>> 
>>> 
>> [122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0]
>>> 5
>>> 
>>> 
>> [32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0]
>>> 6
>>> 
>>> 
>> [40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0]
>>> 
>>> Best regards, Niklas
>>> 
>>> 2015-11-24 16:48 GMT+01:00 Pat Ferrel <pat@occamsmachete.com
>> <javascript:;>>:
>>> 
>>>> Sounds like you may not have the input right. Recommendations should be
>>>> sorted by the strength and so shouldn’t all be 1 unless the data is very
>>>> odd.
>>>> 
>>>> Can you give us a small sample of the input?
>>>> 
>>>> 
>>>> BTW a newer recommender using Mahout’s Spark based code and a search
>>>> engine is here:
>>>> 
>>> 
>> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
>>>> a single machine install script is here:
>>> https://docs.prediction.io/start/
>>>> 
>>>> On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <niklas.ekvall@gmail.com
>> <javascript:;>>
>>>> wrote:
>>>> 
>>>> Hello Mahout Users!
>>>> 
>>>> I use today Mahout - Recommenditembased with Log-similarity to produce
>>>> personal recommendations for Trigger Eamils in a offline mode. But when
>> I
>>>> produce e.g. 50 recommendations the rank value of the recommendations
>> are
>>>> always of magnitude 1. Why is this so? And, is the first recommendations
>>> in
>>>> this list the best one or is there some randomness in this list?
>>>> 
>>>> Best regards,
>>>> 
>>>> Niklas Ekvall
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>

Re: Mahout - Recommenditemvalue with magnitude of 1

Posted by Niklas Ekvall <ni...@gmail.com>.

Okay!

No pre-filter and the user/item ids should start from 0 and go as many user
and items there are. So, all the data we have should go into Mahout and we
filter inside Mahout....correct?

We do the same pre-filter for Spark item-similarity, is that wrong to?

Best regards, Niklas

On Tuesday, November 24, 2015, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I wouldn’t pre-filter but in any case the ids input to hadoop-mahout need
> to follow those rules.
>
> The new recommender I mentioned has no such requirements, it uses string
> IDs.
>
> On Nov 24, 2015, at 11:44 AM, Niklas Ekvall <niklas.ekvall@gmail.com
> <javascript:;>> wrote:
>
> No, it does not start from 0 and does not cover all number between 0 and
> the number of items/users. We do a prefiltering before (a user must have
> bought at lest 5 product and a product must have been  bought by 3 users)
> we use Mahout on the dataset. Therefore we start with user 3, then it jumps
> to user 5, etc.
>
> Is this wrong? Should we use all data as input to Mahout and do the
> filtring inside Mahout?
>
> We use the second latest version of Mahout!
>
> Best regards, Niklas
>
> On Tuesday, November 24, 2015, Pat Ferrel <pat@occamsmachete.com
> <javascript:;>
> <javascript:_e(%7B%7D,'cvml','pat@occamsmachete.com <javascript:;>');>>
> wrote:
>
> > Do your ids start with 0 and cover all numbers between 0 and the number
> of
> > items -1 (same for user ids)?
> > The old hadoop-mahout code required ordinal ids starting at 0
> >
> >
> > On Nov 24, 2015, at 8:19 AM, Niklas Ekvall <niklas.ekvall@gmail.com
> <javascript:;>>
> > wrote:
> >
> > Hi Pat,
> >
> > Here is some input:
> >
> > 3       7414
> > 3       12682
> > 3       18947
> > 3       19980
> > 3       26975
> > 3       54635
> > 3       67789
> > 3       73212
> > 3       118932
> > 3       138846
> > 3       141268
> > 5       3
> > 5       2123
> > 5       37955
> > 5       39975
> > 5       113289
> > 6       3
> > 6       456
> > 6       2188
> > 6       2496
> > 6       6194
> > 6       6361
> > 6       6768
> > 6       6919
> > 6       6920
> > 6       7257
> > 6       7705
> > 6       7706
> > 6       11788
> >
> > And some output:
> >
> > 3
> >
> >
> [122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0]
> > 5
> >
> >
> [32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0]
> > 6
> >
> >
> [40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0]
> >
> > Best regards, Niklas
> >
> > 2015-11-24 16:48 GMT+01:00 Pat Ferrel <pat@occamsmachete.com
> <javascript:;>>:
> >
> >> Sounds like you may not have the input right. Recommendations should be
> >> sorted by the strength and so shouldn’t all be 1 unless the data is very
> >> odd.
> >>
> >> Can you give us a small sample of the input?
> >>
> >>
> >> BTW a newer recommender using Mahout’s Spark based code and a search
> >> engine is here:
> >>
> >
> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
> >> a single machine install script is here:
> > https://docs.prediction.io/start/
> >>
> >> On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <niklas.ekvall@gmail.com
> <javascript:;>>
> >> wrote:
> >>
> >> Hello Mahout Users!
> >>
> >> I use today Mahout - Recommenditembased with Log-similarity to produce
> >> personal recommendations for Trigger Eamils in a offline mode. But when
> I
> >> produce e.g. 50 recommendations the rank value of the recommendations
> are
> >> always of magnitude 1. Why is this so? And, is the first recommendations
> > in
> >> this list the best one or is there some randomness in this list?
> >>
> >> Best regards,
> >>
> >> Niklas Ekvall
> >>
> >>
> >
> >
>
>

Re: Mahout - Recommenditemvalue with magnitude of 1

Posted by Pat Ferrel <pa...@occamsmachete.com>.

I wouldn’t pre-filter but in any case the ids input to hadoop-mahout need to follow those rules.

The new recommender I mentioned has no such requirements, it uses string IDs.

On Nov 24, 2015, at 11:44 AM, Niklas Ekvall <ni...@gmail.com> wrote:

No, it does not start from 0 and does not cover all number between 0 and
the number of items/users. We do a prefiltering before (a user must have
bought at lest 5 product and a product must have been  bought by 3 users)
we use Mahout on the dataset. Therefore we start with user 3, then it jumps
to user 5, etc.

Is this wrong? Should we use all data as input to Mahout and do the
filtring inside Mahout?

We use the second latest version of Mahout!

Best regards, Niklas

On Tuesday, November 24, 2015, Pat Ferrel <pat@occamsmachete.com
<javascript:_e(%7B%7D,'cvml','pat@occamsmachete.com');>> wrote:

> Do your ids start with 0 and cover all numbers between 0 and the number of
> items -1 (same for user ids)?
> The old hadoop-mahout code required ordinal ids starting at 0
> 
> 
> On Nov 24, 2015, at 8:19 AM, Niklas Ekvall <ni...@gmail.com>
> wrote:
> 
> Hi Pat,
> 
> Here is some input:
> 
> 3       7414
> 3       12682
> 3       18947
> 3       19980
> 3       26975
> 3       54635
> 3       67789
> 3       73212
> 3       118932
> 3       138846
> 3       141268
> 5       3
> 5       2123
> 5       37955
> 5       39975
> 5       113289
> 6       3
> 6       456
> 6       2188
> 6       2496
> 6       6194
> 6       6361
> 6       6768
> 6       6919
> 6       6920
> 6       7257
> 6       7705
> 6       7706
> 6       11788
> 
> And some output:
> 
> 3
> 
> [122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0]
> 5
> 
> [32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0]
> 6
> 
> [40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0]
> 
> Best regards, Niklas
> 
> 2015-11-24 16:48 GMT+01:00 Pat Ferrel <pa...@occamsmachete.com>:
> 
>> Sounds like you may not have the input right. Recommendations should be
>> sorted by the strength and so shouldn’t all be 1 unless the data is very
>> odd.
>> 
>> Can you give us a small sample of the input?
>> 
>> 
>> BTW a newer recommender using Mahout’s Spark based code and a search
>> engine is here:
>> 
> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
>> a single machine install script is here:
> https://docs.prediction.io/start/
>> 
>> On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <ni...@gmail.com>
>> wrote:
>> 
>> Hello Mahout Users!
>> 
>> I use today Mahout - Recommenditembased with Log-similarity to produce
>> personal recommendations for Trigger Eamils in a offline mode. But when I
>> produce e.g. 50 recommendations the rank value of the recommendations are
>> always of magnitude 1. Why is this so? And, is the first recommendations
> in
>> this list the best one or is there some randomness in this list?
>> 
>> Best regards,
>> 
>> Niklas Ekvall
>> 
>> 
> 
>

Mahout - Recommenditemvalue with magnitude of 1

Posted by Niklas Ekvall <ni...@gmail.com>.

No, it does not start from 0 and does not cover all number between 0 and
the number of items/users. We do a prefiltering before (a user must have
bought at lest 5 product and a product must have been  bought by 3 users)
we use Mahout on the dataset. Therefore we start with user 3, then it jumps
to user 5, etc.

Is this wrong? Should we use all data as input to Mahout and do the
filtring inside Mahout?

We use the second latest version of Mahout!

Best regards, Niklas

On Tuesday, November 24, 2015, Pat Ferrel <pat@occamsmachete.com
<javascript:_e(%7B%7D,'cvml','pat@occamsmachete.com');>> wrote:

> Do your ids start with 0 and cover all numbers between 0 and the number of
> items -1 (same for user ids)?
> The old hadoop-mahout code required ordinal ids starting at 0
>
>
> On Nov 24, 2015, at 8:19 AM, Niklas Ekvall <ni...@gmail.com>
> wrote:
>
> Hi Pat,
>
> Here is some input:
>
> 3       7414
> 3       12682
> 3       18947
> 3       19980
> 3       26975
> 3       54635
> 3       67789
> 3       73212
> 3       118932
> 3       138846
> 3       141268
> 5       3
> 5       2123
> 5       37955
> 5       39975
> 5       113289
> 6       3
> 6       456
> 6       2188
> 6       2496
> 6       6194
> 6       6361
> 6       6768
> 6       6919
> 6       6920
> 6       7257
> 6       7705
> 6       7706
> 6       11788
>
> And some output:
>
> 3
>
> [122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0]
> 5
>
> [32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0]
> 6
>
> [40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0]
>
> Best regards, Niklas
>
> 2015-11-24 16:48 GMT+01:00 Pat Ferrel <pa...@occamsmachete.com>:
>
> > Sounds like you may not have the input right. Recommendations should be
> > sorted by the strength and so shouldn’t all be 1 unless the data is very
> > odd.
> >
> > Can you give us a small sample of the input?
> >
> >
> > BTW a newer recommender using Mahout’s Spark based code and a search
> > engine is here:
> >
> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
> > a single machine install script is here:
> https://docs.prediction.io/start/
> >
> > On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <ni...@gmail.com>
> > wrote:
> >
> > Hello Mahout Users!
> >
> > I use today Mahout - Recommenditembased with Log-similarity to produce
> > personal recommendations for Trigger Eamils in a offline mode. But when I
> > produce e.g. 50 recommendations the rank value of the recommendations are
> > always of magnitude 1. Why is this so? And, is the first recommendations
> in
> > this list the best one or is there some randomness in this list?
> >
> > Best regards,
> >
> > Niklas Ekvall
> >
> >
>
>

Re: Mahout - Recommenditemvalue with magnitude of 1

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Do your ids start with 0 and cover all numbers between 0 and the number of items -1 (same for user ids)?
The old hadoop-mahout code required ordinal ids starting at 0


On Nov 24, 2015, at 8:19 AM, Niklas Ekvall <ni...@gmail.com> wrote:

Hi Pat,

Here is some input:

3       7414
3       12682
3       18947
3       19980
3       26975
3       54635
3       67789
3       73212
3       118932
3       138846
3       141268
5       3
5       2123
5       37955
5       39975
5       113289
6       3
6       456
6       2188
6       2496
6       6194
6       6361
6       6768
6       6919
6       6920
6       7257
6       7705
6       7706
6       11788

And some output:

3
[122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0]
5
[32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0]
6
[40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0]

Best regards, Niklas

2015-11-24 16:48 GMT+01:00 Pat Ferrel <pa...@occamsmachete.com>:

> Sounds like you may not have the input right. Recommendations should be
> sorted by the strength and so shouldn’t all be 1 unless the data is very
> odd.
> 
> Can you give us a small sample of the input?
> 
> 
> BTW a newer recommender using Mahout’s Spark based code and a search
> engine is here:
> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
> a single machine install script is here: https://docs.prediction.io/start/
> 
> On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <ni...@gmail.com>
> wrote:
> 
> Hello Mahout Users!
> 
> I use today Mahout - Recommenditembased with Log-similarity to produce
> personal recommendations for Trigger Eamils in a offline mode. But when I
> produce e.g. 50 recommendations the rank value of the recommendations are
> always of magnitude 1. Why is this so? And, is the first recommendations in
> this list the best one or is there some randomness in this list?
> 
> Best regards,
> 
> Niklas Ekvall
> 
>

Re: Mahout - Recommenditemvalue with magnitude of 1

Posted by Niklas Ekvall <ni...@gmail.com>.

Hi Pat,

Here is some input:

3       7414
3       12682
3       18947
3       19980
3       26975
3       54635
3       67789
3       73212
3       118932
3       138846
3       141268
5       3
5       2123
5       37955
5       39975
5       113289
6       3
6       456
6       2188
6       2496
6       6194
6       6361
6       6768
6       6919
6       6920
6       7257
6       7705
6       7706
6       11788

And some output:

3
 [122086:1.0,1846:1.0,74638:1.0,63240:1.0,87540:1.0,2742:1.0,2981:1.0,8325:1.0,145598:1.0,49675:1.0,131388:1.0,72113:1.0,3493:1.0,56131:1.0,30422:1.0,87829:1.0,111190:1.0,13597:1.0,83436:1.0,61772:1.0]
5
 [32349:1.0,29413:1.0,111896:1.0,61845:1.0,50016:1.0,1607:1.0,15237:1.0,133229:1.0,65805:1.0,34034:1.0,133071:1.0,28894:1.0,18658:1.0,32095:1.0,4402:1.0,47522:1.0,31022:1.0,23936:1.0,6243:1.0,53214:1.0]
6
 [40756:1.0,34420:1.0,31153:1.0,114717:1.0,53945:1.0,71148:1.0,26095:1.0,112941:1.0,55284:1.0,111346:1.0,112201:1.0,65759:1.0,133127:1.0,61378:1.0,16413:1.0,113289:1.0,49675:1.0,14995:1.0,141028:1.0,27506:1.0]

Best regards, Niklas

2015-11-24 16:48 GMT+01:00 Pat Ferrel <pa...@occamsmachete.com>:

> Sounds like you may not have the input right. Recommendations should be
> sorted by the strength and so shouldn’t all be 1 unless the data is very
> odd.
>
> Can you give us a small sample of the input?
>
>
> BTW a newer recommender using Mahout’s Spark based code and a search
> engine is here:
> https://github.com/PredictionIO/template-scala-parallel-universal-recommendation
> a single machine install script is here: https://docs.prediction.io/start/
>
> On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <ni...@gmail.com>
> wrote:
>
> Hello Mahout Users!
>
> I use today Mahout - Recommenditembased with Log-similarity to produce
> personal recommendations for Trigger Eamils in a offline mode. But when I
> produce e.g. 50 recommendations the rank value of the recommendations are
> always of magnitude 1. Why is this so? And, is the first recommendations in
> this list the best one or is there some randomness in this list?
>
> Best regards,
>
> Niklas Ekvall
>
>

Re: Mahout - Recommenditemvalue with magnitude of 1

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Sounds like you may not have the input right. Recommendations should be sorted by the strength and so shouldn’t all be 1 unless the data is very odd.

Can you give us a small sample of the input?


BTW a newer recommender using Mahout’s Spark based code and a search engine is here: https://github.com/PredictionIO/template-scala-parallel-universal-recommendation a single machine install script is here: https://docs.prediction.io/start/
 
On Nov 24, 2015, at 2:16 AM, Niklas Ekvall <ni...@gmail.com> wrote:

Hello Mahout Users!

I use today Mahout - Recommenditembased with Log-similarity to produce
personal recommendations for Trigger Eamils in a offline mode. But when I
produce e.g. 50 recommendations the rank value of the recommendations are
always of magnitude 1. Why is this so? And, is the first recommendations in
this list the best one or is there some randomness in this list?

Best regards,

Niklas Ekvall