You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Niall Riddell <ni...@xspca.com> on 2010/12/10 16:50:43 UTC

RecommenderJob - ArrayIndexOutOfBoundsException

Hi,

I've been studiously working through Mahout In Action and I'm currently
trying to execute the RecommenderJob on my local Hadoop instance.

Hadoop is up and running on 20.2 and DHFS is working fine.

I've downloaded the links-simple-sorted.txt input file and uploaded this
successfully to input/input.txt using

*bin/hadoop fs -put <PATH TO TARGET>/links-simple-sorted.txt input/input.txt
*
*
*
I'm also running off revision 1044404 from trunk.  Everything has
compiled beautifully using mvn clean package.

I kick of the RecommenderJob on hadoop:

*bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-0.5-SNAPSHOT-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
input/users.txt --booleanData true*

In Mahout In Action is states to run it without the true after --booleanData
so I've tried it both ways and get the same result which is:

10/12/10 15:30:15 INFO common.AbstractJob: Command line arguments:
{--booleanData=true, --endPhase=2147483647, --maxCooccurrencesPerItem=100,
--maxPrefsPerUser=10, --maxSimilaritiesPerItem=100, --numRecommendations=10,
--similarityClassname=SIMILARITY_COOCCURRENCE, --startPhase=0,
--tempDir=temp, --usersFile=input/users.txt}
10/12/10 15:30:16 INFO input.FileInputFormat: Total input paths to process :
1
10/12/10 15:30:16 INFO mapred.JobClient: Running job: job_201012101239_0016
10/12/10 15:30:17 INFO mapred.JobClient:  map 0% reduce 0%
10/12/10 15:30:27 INFO mapred.JobClient: Task Id :
attempt_201012101239_0016_m_000000_0, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: 1
at
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

.........

Now I've googled alot and the suggestions seem to be that a malformed input
file could cause this.  However the input file (links-simple-sorted.txt -
downloaded from the stated site) seems fine when opened with vim.  I've also
tried different formats for the users.txt just in case it needed a new line
after editing it to add 3.

I'm really going nowhere now so any pointers would be great, please.

Thanks in advance.

-- 
Niall Riddell
*xSpace Analytics Ltd*

Re: RecommenderJob - ArrayIndexOutOfBoundsException

Posted by Niall Riddell <ni...@xspca.com>.
Hi Sean,

I will try and swap in the correct mapper for the input data as externally
converting the file is not something I want to do at the moment.

Grateful for the pointers.

Thanks

Niall


On 10 December 2010 16:34, Sean Owen <sr...@gmail.com> wrote:

> Yes it needs to be in "user,item[,rating]" format instead to use the
> regular implementation. See the discussion in 6.3.2. The listing under it
> shows a different Mapper called WikipediaToItemPrefsMapper, which will read
> this input format though. You can swap that in. Or you can externally
> convert the data file into standard form.
>
>
> On Fri, Dec 10, 2010 at 4:30 PM, Niall Riddell <ni...@xspca.com>wrote:
>
>> Hi Sean
>>
>> Here is an extract of the first 20 users of
>>
>> 1: 1664968
>> 2: 3 747213 1664968 1691047 4095634 5535664
>> 3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091
>> 1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217
>> 2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028
>> 3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437
>> 3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168
>> 4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741
>> 5223097 5302153 5474252 5535280
>> 4: 145
>> 5: 8 57544 58089 60048 65880 284186 313376 564578 717529 729993 1097284
>> 1204280 1204407 1255317 1670218 1720928 1850305 2269887 2333350 2359764
>> 2640693 2743982 3303009 3322952 3492254 3573013 3721693 3797343 3797349
>> 3797359 3849461 4033556 4173124 4189215 4207986 4669945 4817900 4901416
>> 5010479 5062062 5072938 5098953 5292042 5429924 5599862 5599863 5689049
>> 6: 8
>> 7: 8
>> 8: 5 57544 58089 59375 64985 313376 704624 717529 729993 1204280 1204407
>> 1254637 1255317 1497566 1720928 1850305 2269887 2333350 2359764 2496900
>> 2640848 2743982 3303009 3322952 3492254 3573013 3797343 3797349 3797359
>> 4033556 4173124 4189168 4206743 4207986 4393611 4813259 4901416 5010479
>> 5062062 5072938 5098953 5292042 5429924 5599862 5599863
>> 9: 3 74106 75221 275656 313376 1279972 1565872 1613838 1997564 2640650
>> 3092827 3491412 3492254 3956845 3973207 4025897 4189168 4189215 4813259
>> 10: 3
>> 11: 60956 313376 322893 497519 499246 594399 801968 806840 1123171 1228259
>> 1463265 1892998 2022036 2070954 2639079 3492254 3594794 3967074 4096317
>> 4189168 4189215 4273212 4611415 4708418 4813259 5300058 5575496
>> 12: 5
>> 13: 5534647
>> 14: 4116750
>> 15: 4095634
>> 16: 5534647
>> 17: 5703728
>> 18: 4207272
>> 19: 2402613
>> 20: 2402613
>>
>> Thanks
>>
>> Niall
>>
>


-- 
Niall Riddell
*xSpace Analytics Ltd*
*
------------------------------------------------------------------------------------------------------------
*
T: +44 161 408 3830
M:+44 778 696 3830
Skype: niall.riddell
*
------------------------------------------------------------------------------------------------------------
*

Re: RecommenderJob - ArrayIndexOutOfBoundsException

Posted by Sean Owen <sr...@gmail.com>.
Yes it needs to be in "user,item[,rating]" format instead to use the regular
implementation. See the discussion in 6.3.2. The listing under it shows a
different Mapper called WikipediaToItemPrefsMapper, which will read this
input format though. You can swap that in. Or you can externally convert the
data file into standard form.

On Fri, Dec 10, 2010 at 4:30 PM, Niall Riddell <ni...@xspca.com>wrote:

> Hi Sean
>
> Here is an extract of the first 20 users of
>
> 1: 1664968
> 2: 3 747213 1664968 1691047 4095634 5535664
> 3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091
> 1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217
> 2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028
> 3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437
> 3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168
> 4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741
> 5223097 5302153 5474252 5535280
> 4: 145
> 5: 8 57544 58089 60048 65880 284186 313376 564578 717529 729993 1097284
> 1204280 1204407 1255317 1670218 1720928 1850305 2269887 2333350 2359764
> 2640693 2743982 3303009 3322952 3492254 3573013 3721693 3797343 3797349
> 3797359 3849461 4033556 4173124 4189215 4207986 4669945 4817900 4901416
> 5010479 5062062 5072938 5098953 5292042 5429924 5599862 5599863 5689049
> 6: 8
> 7: 8
> 8: 5 57544 58089 59375 64985 313376 704624 717529 729993 1204280 1204407
> 1254637 1255317 1497566 1720928 1850305 2269887 2333350 2359764 2496900
> 2640848 2743982 3303009 3322952 3492254 3573013 3797343 3797349 3797359
> 4033556 4173124 4189168 4206743 4207986 4393611 4813259 4901416 5010479
> 5062062 5072938 5098953 5292042 5429924 5599862 5599863
> 9: 3 74106 75221 275656 313376 1279972 1565872 1613838 1997564 2640650
> 3092827 3491412 3492254 3956845 3973207 4025897 4189168 4189215 4813259
> 10: 3
> 11: 60956 313376 322893 497519 499246 594399 801968 806840 1123171 1228259
> 1463265 1892998 2022036 2070954 2639079 3492254 3594794 3967074 4096317
> 4189168 4189215 4273212 4611415 4708418 4813259 5300058 5575496
> 12: 5
> 13: 5534647
> 14: 4116750
> 15: 4095634
> 16: 5534647
> 17: 5703728
> 18: 4207272
> 19: 2402613
> 20: 2402613
>
> Thanks
>
> Niall
>

Re: RecommenderJob - ArrayIndexOutOfBoundsException

Posted by Niall Riddell <ni...@xspca.com>.
Hi Sean

Here is an extract of the first 20 users of

1: 1664968
2: 3 747213 1664968 1691047 4095634 5535664
3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091
1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217
2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028
3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437
3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168
4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741
5223097 5302153 5474252 5535280
4: 145
5: 8 57544 58089 60048 65880 284186 313376 564578 717529 729993 1097284
1204280 1204407 1255317 1670218 1720928 1850305 2269887 2333350 2359764
2640693 2743982 3303009 3322952 3492254 3573013 3721693 3797343 3797349
3797359 3849461 4033556 4173124 4189215 4207986 4669945 4817900 4901416
5010479 5062062 5072938 5098953 5292042 5429924 5599862 5599863 5689049
6: 8
7: 8
8: 5 57544 58089 59375 64985 313376 704624 717529 729993 1204280 1204407
1254637 1255317 1497566 1720928 1850305 2269887 2333350 2359764 2496900
2640848 2743982 3303009 3322952 3492254 3573013 3797343 3797349 3797359
4033556 4173124 4189168 4206743 4207986 4393611 4813259 4901416 5010479
5062062 5072938 5098953 5292042 5429924 5599862 5599863
9: 3 74106 75221 275656 313376 1279972 1565872 1613838 1997564 2640650
3092827 3491412 3492254 3956845 3973207 4025897 4189168 4189215 4813259
10: 3
11: 60956 313376 322893 497519 499246 594399 801968 806840 1123171 1228259
1463265 1892998 2022036 2070954 2639079 3492254 3594794 3967074 4096317
4189168 4189215 4273212 4611415 4708418 4813259 5300058 5575496
12: 5
13: 5534647
14: 4116750
15: 4095634
16: 5534647
17: 5703728
18: 4207272
19: 2402613
20: 2402613

Thanks

Niall

On 10 December 2010 15:56, Sean Owen <sr...@gmail.com> wrote:

> Yes it is a problem with the input -- would be helpful to see (part of) it.
>
> On Fri, Dec 10, 2010 at 3:50 PM, Niall Riddell <ni...@xspca.com>wrote:
>
>> Hi,
>>
>> I've been studiously working through Mahout In Action and I'm currently
>> trying to execute the RecommenderJob on my local Hadoop instance.
>>
>> Hadoop is up and running on 20.2 and DHFS is working fine.
>>
>> I've downloaded the links-simple-sorted.txt input file and uploaded this
>> successfully to input/input.txt using
>>
>> *bin/hadoop fs -put <PATH TO TARGET>/links-simple-sorted.txt
>> input/input.txt
>> *
>> *
>> *
>> I'm also running off revision 1044404 from trunk.  Everything has
>> compiled beautifully using mvn clean package.
>>
>> I kick of the RecommenderJob on hadoop:
>>
>> *bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-0.5-SNAPSHOT-job.jar
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
>> input/users.txt --booleanData true*
>>
>> In Mahout In Action is states to run it without the true after
>> --booleanData
>> so I've tried it both ways and get the same result which is:
>>
>> 10/12/10 15:30:15 INFO common.AbstractJob: Command line arguments:
>> {--booleanData=true, --endPhase=2147483647, --maxCooccurrencesPerItem=100,
>> --maxPrefsPerUser=10, --maxSimilaritiesPerItem=100,
>> --numRecommendations=10,
>> --similarityClassname=SIMILARITY_COOCCURRENCE, --startPhase=0,
>> --tempDir=temp, --usersFile=input/users.txt}
>> 10/12/10 15:30:16 INFO input.FileInputFormat: Total input paths to process
>> :
>> 1
>> 10/12/10 15:30:16 INFO mapred.JobClient: Running job:
>> job_201012101239_0016
>> 10/12/10 15:30:17 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/12/10 15:30:27 INFO mapred.JobClient: Task Id :
>> attempt_201012101239_0016_m_000000_0, Status : FAILED
>> java.lang.ArrayIndexOutOfBoundsException: 1
>> at
>>
>> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
>> at
>>
>> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> .........
>>
>> Now I've googled alot and the suggestions seem to be that a malformed
>> input
>> file could cause this.  However the input file (links-simple-sorted.txt -
>> downloaded from the stated site) seems fine when opened with vim.  I've
>> also
>> tried different formats for the users.txt just in case it needed a new
>> line
>> after editing it to add 3.
>>
>> I'm really going nowhere now so any pointers would be great, please.
>>
>> Thanks in advance.
>>
>> --
>> Niall Riddell
>> *xSpace Analytics Ltd*
>>
>
>


-- 
Niall Riddell
*xSpace Analytics Ltd*
*
------------------------------------------------------------------------------------------------------------
*
T: +44 161 408 3830
M:+44 778 696 3830
Skype: niall.riddell
*
------------------------------------------------------------------------------------------------------------
*

Re: RecommenderJob - ArrayIndexOutOfBoundsException

Posted by Sean Owen <sr...@gmail.com>.
Yes it is a problem with the input -- would be helpful to see (part of) it.

On Fri, Dec 10, 2010 at 3:50 PM, Niall Riddell <ni...@xspca.com>wrote:

> Hi,
>
> I've been studiously working through Mahout In Action and I'm currently
> trying to execute the RecommenderJob on my local Hadoop instance.
>
> Hadoop is up and running on 20.2 and DHFS is working fine.
>
> I've downloaded the links-simple-sorted.txt input file and uploaded this
> successfully to input/input.txt using
>
> *bin/hadoop fs -put <PATH TO TARGET>/links-simple-sorted.txt
> input/input.txt
> *
> *
> *
> I'm also running off revision 1044404 from trunk.  Everything has
> compiled beautifully using mvn clean package.
>
> I kick of the RecommenderJob on hadoop:
>
> *bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-0.5-SNAPSHOT-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> input/users.txt --booleanData true*
>
> In Mahout In Action is states to run it without the true after
> --booleanData
> so I've tried it both ways and get the same result which is:
>
> 10/12/10 15:30:15 INFO common.AbstractJob: Command line arguments:
> {--booleanData=true, --endPhase=2147483647, --maxCooccurrencesPerItem=100,
> --maxPrefsPerUser=10, --maxSimilaritiesPerItem=100,
> --numRecommendations=10,
> --similarityClassname=SIMILARITY_COOCCURRENCE, --startPhase=0,
> --tempDir=temp, --usersFile=input/users.txt}
> 10/12/10 15:30:16 INFO input.FileInputFormat: Total input paths to process
> :
> 1
> 10/12/10 15:30:16 INFO mapred.JobClient: Running job: job_201012101239_0016
> 10/12/10 15:30:17 INFO mapred.JobClient:  map 0% reduce 0%
> 10/12/10 15:30:27 INFO mapred.JobClient: Task Id :
> attempt_201012101239_0016_m_000000_0, Status : FAILED
> java.lang.ArrayIndexOutOfBoundsException: 1
> at
>
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
> at
>
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> .........
>
> Now I've googled alot and the suggestions seem to be that a malformed input
> file could cause this.  However the input file (links-simple-sorted.txt -
> downloaded from the stated site) seems fine when opened with vim.  I've
> also
> tried different formats for the users.txt just in case it needed a new line
> after editing it to add 3.
>
> I'm really going nowhere now so any pointers would be great, please.
>
> Thanks in advance.
>
> --
> Niall Riddell
> *xSpace Analytics Ltd*
>