You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Jake Vang <va...@googlemail.com> on 2011/03/12 06:03:15 UTC

no output for RecommenderJob, v0.4

hi,

i am testing the RecommenderJob. according to the v0.4 javadocs, it
requires the format: userID,itemID[,preferencevalue]. i have a very
simple input i want to test before running it on the real dataset. my
toy data set is as simple as the following lines.

1,10
1,20
2,10
2,30
2,40
3,10
3,20

(user 1 likes item 10, user 1 likes item 20, and so on).

i then run the job.

hadoop jar mahout-core-0.4-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=/input/toy/toydata.txt
-Dmapred.output.dir=/output/toy01

however, when i look at the results in (part-r-0000), i see nothing.
the file is blank. why is this happening?

i am running this on cygwin. i can run the hadoop examples correctly.
is there something that i am doing wrong?

Re: no output for RecommenderJob, v0.4

Posted by Ted Dunning <te...@gmail.com>.
There isn't just one.

Many, many improvements have been made.

On Mon, Mar 14, 2011 at 8:24 AM, Jake Vang <va...@googlemail.com> wrote:

> could you let me know what the serious bug in v0.4 is?
>

Re: no output for RecommenderJob, v0.4

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Jake,

https://issues.apache.org/jira/browse/MAHOUT-610 has the details about 
the bug.

--sebastian


On 14.03.2011 16:24, Jake Vang wrote:
> Sebastian,
>
> thanks. i tried the --booleanData option and now i do get some output.
>
> could you let me know what the serious bug in v0.4 is?
>
> thanks.
>
> On Sat, Mar 12, 2011 at 4:12 AM, Sebastian Schelter<ss...@apache.org>  wrote:
>> Hello Jake,
>>
>> my first advice would be to use the RecommenderJob from the current trunk,
>> the 0.4 version has a serious bug unfortunately.
>>
>> Your toy data is too small to give output, let me explain why.
>>
>> The first thing that RecommenderJob will do is to compute all pairs of
>> similar items (all pairs of items that cooccured within the preferences of a
>> single user):
>>
>> 10,20
>> 10,30
>> 10,40
>> 30,40
>>
>> The next thing that happens is that RecommenderJob tries to predict how much
>> the users like items that might possibly be recommended to them. In order to
>> do this for a single user,item pair we need to look at all items similar to
>> the "candidate" item that have also been liked by the user. The formula used
>> is a weighted sum defined like this:
>>
>> u = a user
>> i = an item not yet rated by u
>> N = all items similar to i
>>
>> Prediction(u,i) = sum(all n from N: similarity(i,n) * rating(u,n)) / sum(all
>> n from N: abs(similarity(i,n)))
>>
>> This formula has one drawback. If we only know a single similar item, the
>> prediction will just be the "rating" value for that single similar item. In
>> order to avoid this, we throw out all predicitions that we're based on a
>> single item only.
>>
>> Unfortunately your toy data is so small that there is no prediction, that
>> can be based on more than one item, so everything is thrown away and the
>> output is empty.
>>
>> As you only have boolean data in your example (no ratings), you could use
>> --booleanData to make RecommenderJob treat the input as boolean (it does not
>> do this automatically). In that case you should see output as the previously
>> described problem doesn't exist for boolean data.
>>
>> --sebastian
>>
>> On 12.03.2011 06:03, Jake Vang wrote:
>>> hi,
>>>
>>> i am testing the RecommenderJob. according to the v0.4 javadocs, it
>>> requires the format: userID,itemID[,preferencevalue]. i have a very
>>> simple input i want to test before running it on the real dataset. my
>>> toy data set is as simple as the following lines.
>>>
>>> 1,10
>>> 1,20
>>> 2,10
>>> 2,30
>>> 2,40
>>> 3,10
>>> 3,20
>>>
>>> (user 1 likes item 10, user 1 likes item 20, and so on).
>>>
>>> i then run the job.
>>>
>>> hadoop jar mahout-core-0.4-job.jar
>>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>>> -Dmapred.input.dir=/input/toy/toydata.txt
>>> -Dmapred.output.dir=/output/toy01
>>>
>>> however, when i look at the results in (part-r-0000), i see nothing.
>>> the file is blank. why is this happening?
>>>
>>> i am running this on cygwin. i can run the hadoop examples correctly.
>>> is there something that i am doing wrong?
>>


Re: no output for RecommenderJob, v0.4

Posted by Jake Vang <va...@googlemail.com>.
Sebastian,

thanks. i tried the --booleanData option and now i do get some output.

could you let me know what the serious bug in v0.4 is?

thanks.

On Sat, Mar 12, 2011 at 4:12 AM, Sebastian Schelter <ss...@apache.org> wrote:
> Hello Jake,
>
> my first advice would be to use the RecommenderJob from the current trunk,
> the 0.4 version has a serious bug unfortunately.
>
> Your toy data is too small to give output, let me explain why.
>
> The first thing that RecommenderJob will do is to compute all pairs of
> similar items (all pairs of items that cooccured within the preferences of a
> single user):
>
> 10,20
> 10,30
> 10,40
> 30,40
>
> The next thing that happens is that RecommenderJob tries to predict how much
> the users like items that might possibly be recommended to them. In order to
> do this for a single user,item pair we need to look at all items similar to
> the "candidate" item that have also been liked by the user. The formula used
> is a weighted sum defined like this:
>
> u = a user
> i = an item not yet rated by u
> N = all items similar to i
>
> Prediction(u,i) = sum(all n from N: similarity(i,n) * rating(u,n)) / sum(all
> n from N: abs(similarity(i,n)))
>
> This formula has one drawback. If we only know a single similar item, the
> prediction will just be the "rating" value for that single similar item. In
> order to avoid this, we throw out all predicitions that we're based on a
> single item only.
>
> Unfortunately your toy data is so small that there is no prediction, that
> can be based on more than one item, so everything is thrown away and the
> output is empty.
>
> As you only have boolean data in your example (no ratings), you could use
> --booleanData to make RecommenderJob treat the input as boolean (it does not
> do this automatically). In that case you should see output as the previously
> described problem doesn't exist for boolean data.
>
> --sebastian
>
> On 12.03.2011 06:03, Jake Vang wrote:
>>
>> hi,
>>
>> i am testing the RecommenderJob. according to the v0.4 javadocs, it
>> requires the format: userID,itemID[,preferencevalue]. i have a very
>> simple input i want to test before running it on the real dataset. my
>> toy data set is as simple as the following lines.
>>
>> 1,10
>> 1,20
>> 2,10
>> 2,30
>> 2,40
>> 3,10
>> 3,20
>>
>> (user 1 likes item 10, user 1 likes item 20, and so on).
>>
>> i then run the job.
>>
>> hadoop jar mahout-core-0.4-job.jar
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>> -Dmapred.input.dir=/input/toy/toydata.txt
>> -Dmapred.output.dir=/output/toy01
>>
>> however, when i look at the results in (part-r-0000), i see nothing.
>> the file is blank. why is this happening?
>>
>> i am running this on cygwin. i can run the hadoop examples correctly.
>> is there something that i am doing wrong?
>
>

Re: no output for RecommenderJob, v0.4

Posted by Sebastian Schelter <ss...@apache.org>.
Hello Jake,

my first advice would be to use the RecommenderJob from the current 
trunk, the 0.4 version has a serious bug unfortunately.

Your toy data is too small to give output, let me explain why.

The first thing that RecommenderJob will do is to compute all pairs of 
similar items (all pairs of items that cooccured within the preferences 
of a single user):

10,20
10,30
10,40
30,40

The next thing that happens is that RecommenderJob tries to predict how 
much the users like items that might possibly be recommended to them. In 
order to do this for a single user,item pair we need to look at all 
items similar to the "candidate" item that have also been liked by the 
user. The formula used is a weighted sum defined like this:

u = a user
i = an item not yet rated by u
N = all items similar to i

Prediction(u,i) = sum(all n from N: similarity(i,n) * rating(u,n)) / 
sum(all n from N: abs(similarity(i,n)))

This formula has one drawback. If we only know a single similar item, 
the prediction will just be the "rating" value for that single similar 
item. In order to avoid this, we throw out all predicitions that we're 
based on a single item only.

Unfortunately your toy data is so small that there is no prediction, 
that can be based on more than one item, so everything is thrown away 
and the output is empty.

As you only have boolean data in your example (no ratings), you could 
use --booleanData to make RecommenderJob treat the input as boolean (it 
does not do this automatically). In that case you should see output as 
the previously described problem doesn't exist for boolean data.

--sebastian

On 12.03.2011 06:03, Jake Vang wrote:
> hi,
>
> i am testing the RecommenderJob. according to the v0.4 javadocs, it
> requires the format: userID,itemID[,preferencevalue]. i have a very
> simple input i want to test before running it on the real dataset. my
> toy data set is as simple as the following lines.
>
> 1,10
> 1,20
> 2,10
> 2,30
> 2,40
> 3,10
> 3,20
>
> (user 1 likes item 10, user 1 likes item 20, and so on).
>
> i then run the job.
>
> hadoop jar mahout-core-0.4-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=/input/toy/toydata.txt
> -Dmapred.output.dir=/output/toy01
>
> however, when i look at the results in (part-r-0000), i see nothing.
> the file is blank. why is this happening?
>
> i am running this on cygwin. i can run the hadoop examples correctly.
> is there something that i am doing wrong?