You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Florent Empis <fl...@gmail.com> on 2010/08/09 16:18:33 UTC

Mahout in Action/ Distributed recommandation

Hi,

I just tried to follow Mahout In Action, 6.4.2 Running recommendations with
Hadoop

When I launch
 bin/hadoop jar ~/mahout/trunk/core/target/mahout-core-0.4-SNAPSHOT.job
 org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
input/users.txt --booleanData

It fails and the job's log contains many occurences of the following
exception. I do understand why the number format fails, but I don't
understand why it's attempting it in the first place....Anyone had success
running this example?

MapAttempt TASK_TYPE="MAP" TASKID="task_201008091547_0003_m_000000"
TASK_ATTEMPT_ID="attempt_201008091547_0003_m_000000_0" TASK_STATUS="FAILED"
FINISH_TIME="1281362639468" HOSTNAME="localhost"
ERROR="java\.lang\.NumberFormatException: For input string: \"1: 1664968\"
at
java\.lang\.NumberFormatException\.forInputString(NumberFormatException\.java:48)
at java\.lang\.Long\.parseLong(Long\.java:419)
at java\.lang\.Long\.parseLong(Long\.java:468)
at
org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:40)
at
org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:31)
at org\.apache\.hadoop\.mapreduce\.Mapper\.run(Mapper\.java:144)
at org\.apache\.hadoop\.mapred\.MapTask\.runNewMapper(MapTask\.java:621)
at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)

Many thanks,

Florent

Re: Mahout in Action/ Distributed recommandation

Posted by Sebastian Schelter <ss...@googlemail.com>.
Hi Florent,

I guess the example in Mahout in Action is not valid anymore (not sure
though, I have not yet bought a copy, don't have paypal) because
RecommenderJob has undergone substantial changes.

If you could supply some details about what you're trying to do (format
of your input data, etc), I can tell you how to make it work.

--sebastian


Am 09.08.2010 16:18, schrieb Florent Empis:
> Hi,
>
> I just tried to follow Mahout In Action, 6.4.2 Running recommendations with
> Hadoop
>
> When I launch
>  bin/hadoop jar ~/mahout/trunk/core/target/mahout-core-0.4-SNAPSHOT.job
>  org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> input/users.txt --booleanData
>
> It fails and the job's log contains many occurences of the following
> exception. I do understand why the number format fails, but I don't
> understand why it's attempting it in the first place....Anyone had success
> running this example?
>
> MapAttempt TASK_TYPE="MAP" TASKID="task_201008091547_0003_m_000000"
> TASK_ATTEMPT_ID="attempt_201008091547_0003_m_000000_0" TASK_STATUS="FAILED"
> FINISH_TIME="1281362639468" HOSTNAME="localhost"
> ERROR="java\.lang\.NumberFormatException: For input string: \"1: 1664968\"
> at
> java\.lang\.NumberFormatException\.forInputString(NumberFormatException\.java:48)
> at java\.lang\.Long\.parseLong(Long\.java:419)
> at java\.lang\.Long\.parseLong(Long\.java:468)
> at
> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:40)
> at
> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:31)
> at org\.apache\.hadoop\.mapreduce\.Mapper\.run(Mapper\.java:144)
> at org\.apache\.hadoop\.mapred\.MapTask\.runNewMapper(MapTask\.java:621)
> at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
>
> Many thanks,
>
> Florent
>
>   


Re: Mahout in Action/ Distributed recommandation

Posted by Sean Owen <sr...@gmail.com>.
Ah OK. You are trying to run the regular Mahout classes on the
Wikipedia data set. This won't work since the format is wrong.

The book puts forth in listing 6.1 an alternate Mapper which parses
the Wikipedia format. You could easily stick that in, in place of the
usual Mapper, in RecommenderJob, instead of recreating your own
pipeline. It should be otherwise the same.

On Mon, Aug 9, 2010 at 12:29 PM, Florent Empis <fl...@gmail.com> wrote:
> Actually, Mahout In Action example (using wikipedia article set) states that
> input file is of the form
>
> user item1, item2, itemN
>
> The first step of the job is described as splitting this line with a regex,
> constructing lines of the form
> user item1
> ..
> user itemN
>
> (no preferences as this is a boolean preference dataset)
>
> I've yet to have some time to dive into the code, but I suspect either the
> splitting step has been omitted in the example, or the author assumed the
> job did it but doesn't anymore.
> It should be quite straightforward to investigate this tomorrow.
>
> I turned to the ML in the hope someone already had the issue and found the
> problem. I'll dig the problem out and report in the following days ! ;)
>
> 2010/8/9 Sean Owen <sr...@gmail.com>
>
>> The input file format looks wrong. It should be of the form
>> "userID,itemID[,preference]". I think that's your problem here?
>>
>> On Mon, Aug 9, 2010 at 9:18 AM, Florent Empis <fl...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I just tried to follow Mahout In Action, 6.4.2 Running recommendations
>> with
>> > Hadoop
>> >
>> > When I launch
>> >  bin/hadoop jar ~/mahout/trunk/core/target/mahout-core-0.4-SNAPSHOT.job
>> >  org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>> > -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
>> > input/users.txt --booleanData
>> >
>> > It fails and the job's log contains many occurences of the following
>> > exception. I do understand why the number format fails, but I don't
>> > understand why it's attempting it in the first place....Anyone had
>> success
>> > running this example?
>> >
>> > MapAttempt TASK_TYPE="MAP" TASKID="task_201008091547_0003_m_000000"
>> > TASK_ATTEMPT_ID="attempt_201008091547_0003_m_000000_0"
>> TASK_STATUS="FAILED"
>> > FINISH_TIME="1281362639468" HOSTNAME="localhost"
>> > ERROR="java\.lang\.NumberFormatException: For input string: \"1:
>> 1664968\"
>> > at
>> >
>> java\.lang\.NumberFormatException\.forInputString(NumberFormatException\.java:48)
>> > at java\.lang\.Long\.parseLong(Long\.java:419)
>> > at java\.lang\.Long\.parseLong(Long\.java:468)
>> > at
>> >
>> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:40)
>> > at
>> >
>> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:31)
>> > at org\.apache\.hadoop\.mapreduce\.Mapper\.run(Mapper\.java:144)
>> > at org\.apache\.hadoop\.mapred\.MapTask\.runNewMapper(MapTask\.java:621)
>> > at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
>> > at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
>> >
>> > Many thanks,
>> >
>> > Florent
>> >
>>
>

Re: Mahout in Action/ Distributed recommandation

Posted by Florent Empis <fl...@gmail.com>.
Actually, Mahout In Action example (using wikipedia article set) states that
input file is of the form

user item1, item2, itemN

The first step of the job is described as splitting this line with a regex,
constructing lines of the form
user item1
..
user itemN

(no preferences as this is a boolean preference dataset)

I've yet to have some time to dive into the code, but I suspect either the
splitting step has been omitted in the example, or the author assumed the
job did it but doesn't anymore.
It should be quite straightforward to investigate this tomorrow.

I turned to the ML in the hope someone already had the issue and found the
problem. I'll dig the problem out and report in the following days ! ;)

2010/8/9 Sean Owen <sr...@gmail.com>

> The input file format looks wrong. It should be of the form
> "userID,itemID[,preference]". I think that's your problem here?
>
> On Mon, Aug 9, 2010 at 9:18 AM, Florent Empis <fl...@gmail.com>
> wrote:
> > Hi,
> >
> > I just tried to follow Mahout In Action, 6.4.2 Running recommendations
> with
> > Hadoop
> >
> > When I launch
> >  bin/hadoop jar ~/mahout/trunk/core/target/mahout-core-0.4-SNAPSHOT.job
> >  org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> > -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> > input/users.txt --booleanData
> >
> > It fails and the job's log contains many occurences of the following
> > exception. I do understand why the number format fails, but I don't
> > understand why it's attempting it in the first place....Anyone had
> success
> > running this example?
> >
> > MapAttempt TASK_TYPE="MAP" TASKID="task_201008091547_0003_m_000000"
> > TASK_ATTEMPT_ID="attempt_201008091547_0003_m_000000_0"
> TASK_STATUS="FAILED"
> > FINISH_TIME="1281362639468" HOSTNAME="localhost"
> > ERROR="java\.lang\.NumberFormatException: For input string: \"1:
> 1664968\"
> > at
> >
> java\.lang\.NumberFormatException\.forInputString(NumberFormatException\.java:48)
> > at java\.lang\.Long\.parseLong(Long\.java:419)
> > at java\.lang\.Long\.parseLong(Long\.java:468)
> > at
> >
> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:40)
> > at
> >
> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:31)
> > at org\.apache\.hadoop\.mapreduce\.Mapper\.run(Mapper\.java:144)
> > at org\.apache\.hadoop\.mapred\.MapTask\.runNewMapper(MapTask\.java:621)
> > at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> > at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
> >
> > Many thanks,
> >
> > Florent
> >
>

Re: Mahout in Action/ Distributed recommandation

Posted by Sean Owen <sr...@gmail.com>.
Sebastian should we make that default to something? Like a simple
co-occurrence count? That would be more consistent with the past
behavior.

On Mon, Aug 9, 2010 at 10:32 AM, Sebastian Schelter
<ss...@googlemail.com> wrote:
> It's also necessary to supply the name of a class implementing
> org.apache.mahout.math.hadoop.similarity.vector.DistributedVectorSimilarity
> (as parameter --similarityClassname). This similarity implementation
> will be used to compute the item-item-similarity matrix used for the
> recommendation process.
>
> Implementations for common similarity measures can be found in
> org.apache.mahout.math.hadoop.similarity.vector.

Re: Mahout in Action/ Distributed recommandation

Posted by Sebastian Schelter <ss...@googlemail.com>.
It's also necessary to supply the name of a class implementing
org.apache.mahout.math.hadoop.similarity.vector.DistributedVectorSimilarity
(as parameter --similarityClassname). This similarity implementation
will be used to compute the item-item-similarity matrix used for the
recommendation process.

Implementations for common similarity measures can be found in
org.apache.mahout.math.hadoop.similarity.vector.

--sebastian

Am 09.08.2010 17:18, schrieb Sean Owen:
> The input file format looks wrong. It should be of the form
> "userID,itemID[,preference]". I think that's your problem here?
>
> On Mon, Aug 9, 2010 at 9:18 AM, Florent Empis <fl...@gmail.com> wrote:
>   
>> Hi,
>>
>> I just tried to follow Mahout In Action, 6.4.2 Running recommendations with
>> Hadoop
>>
>> When I launch
>>  bin/hadoop jar ~/mahout/trunk/core/target/mahout-core-0.4-SNAPSHOT.job
>>  org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
>> input/users.txt --booleanData
>>
>> It fails and the job's log contains many occurences of the following
>> exception. I do understand why the number format fails, but I don't
>> understand why it's attempting it in the first place....Anyone had success
>> running this example?
>>
>> MapAttempt TASK_TYPE="MAP" TASKID="task_201008091547_0003_m_000000"
>> TASK_ATTEMPT_ID="attempt_201008091547_0003_m_000000_0" TASK_STATUS="FAILED"
>> FINISH_TIME="1281362639468" HOSTNAME="localhost"
>> ERROR="java\.lang\.NumberFormatException: For input string: \"1: 1664968\"
>> at
>> java\.lang\.NumberFormatException\.forInputString(NumberFormatException\.java:48)
>> at java\.lang\.Long\.parseLong(Long\.java:419)
>> at java\.lang\.Long\.parseLong(Long\.java:468)
>> at
>> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:40)
>> at
>> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:31)
>> at org\.apache\.hadoop\.mapreduce\.Mapper\.run(Mapper\.java:144)
>> at org\.apache\.hadoop\.mapred\.MapTask\.runNewMapper(MapTask\.java:621)
>> at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
>> at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
>>
>> Many thanks,
>>
>> Florent
>>
>>     


Re: Mahout in Action/ Distributed recommandation

Posted by Sean Owen <sr...@gmail.com>.
The input file format looks wrong. It should be of the form
"userID,itemID[,preference]". I think that's your problem here?

On Mon, Aug 9, 2010 at 9:18 AM, Florent Empis <fl...@gmail.com> wrote:
> Hi,
>
> I just tried to follow Mahout In Action, 6.4.2 Running recommendations with
> Hadoop
>
> When I launch
>  bin/hadoop jar ~/mahout/trunk/core/target/mahout-core-0.4-SNAPSHOT.job
>  org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> input/users.txt --booleanData
>
> It fails and the job's log contains many occurences of the following
> exception. I do understand why the number format fails, but I don't
> understand why it's attempting it in the first place....Anyone had success
> running this example?
>
> MapAttempt TASK_TYPE="MAP" TASKID="task_201008091547_0003_m_000000"
> TASK_ATTEMPT_ID="attempt_201008091547_0003_m_000000_0" TASK_STATUS="FAILED"
> FINISH_TIME="1281362639468" HOSTNAME="localhost"
> ERROR="java\.lang\.NumberFormatException: For input string: \"1: 1664968\"
> at
> java\.lang\.NumberFormatException\.forInputString(NumberFormatException\.java:48)
> at java\.lang\.Long\.parseLong(Long\.java:419)
> at java\.lang\.Long\.parseLong(Long\.java:468)
> at
> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:40)
> at
> org\.apache\.mahout\.cf\.taste\.hadoop\.similarity\.item\.CountUsersMapper\.map(CountUsersMapper\.java:31)
> at org\.apache\.hadoop\.mapreduce\.Mapper\.run(Mapper\.java:144)
> at org\.apache\.hadoop\.mapred\.MapTask\.runNewMapper(MapTask\.java:621)
> at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
> at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:170)
>
> Many thanks,
>
> Florent
>