You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2010/11/26 09:33:42 UTC

Re: error in itemsimilarity

It says it right there -- text files *with the preference data*. This is a
collaborative filtering tool, which is quite different from computing
document similarity.

On Fri, Nov 26, 2010 at 8:25 AM, Divya <di...@k2associates.com.sg> wrote:

> Hi,
>
> But in  java doc of ItemSimilarityJob its written that
> "Dmapred.input.dir=(path): Directory containing one or more text files with
> the preference data"
> So I assumed that it may take text files also.
>
> Is there any way by which we can compute similarity between documents.
> I explored Mahout but couldn't find anything.
>
>
> Thanks
> Regards,
> Divya
>
> -----Original Message-----
> From: Sebastian Schelter [mailto:ssc.open@googlemail.com]
> Sent: Friday, November 26, 2010 3:54 PM
> To: user@mahout.apache.org
> Subject: Re: error in itemsimilarity
>
> ItemSimilarityJob can not be used to compute the similarity between text
> documents. It's thought to be used for Collaborative Filtering as
> described here:
>
> https://cwiki.apache.org/confluence/display/MAHOUT/Itembased+Collaborative+F
> iltering
>
> Am 26.11.2010 08:50, schrieb Divya:
> > Hi,
> >
> > I am getting following exception when I try to run itemsimilarity from
> CL.
> >
> > My input data is a text file which just has one line of text
> >
> > Can any one please help me in resolving the error.
> >
> >
> >
> >
> >
> > $ bin/mahout itemsimilarity -i  D:/MahoutResult/ItemSimilarity/Input_Data
> -o
> > D:/MahoutResult/ItemSimilarity/Output -s DistributedUncen
> >
> > teredCosineVectorSimilarity.class
> >
> > Running on hadoop, using HADOOP_HOME=C:\cygwin\home\Divya\hadoop-0.20.2
> >
> > HADOOP_CONF_DIR=C:\cygwin\home\Divya\hadoop-0.20.2\conf
> >
> > 10/11/26 15:43:50 INFO common.AbstractJob: Command line arguments:
> > {--booleanData=false, --endPhase=2147483647, --input=D:/MahoutResult
> >
> > /ItemSimilarity/Input_Data, --maxCooccurrencesPerItem=100,
> > --maxSimilaritiesPerItem=100,
> --output=D:/MahoutResult/ItemSimilarity/Output
> >
> > ,
> --similarityClassname=DistributedUncenteredCosineVectorSimilarity.class,
> > --startPhase=0, --tempDir=temp}
> >
> > 10/11/26 15:43:51 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> > processName=JobTracker, sessionId=
> >
> > 10/11/26 15:43:52 INFO input.FileInputFormat: Total input paths to
> process
> :
> > 2
> >
> > 10/11/26 15:43:53 INFO mapred.JobClient: Running job: job_local_0001
> >
> > 10/11/26 15:43:53 INFO input.FileInputFormat: Total input paths to
> process
> :
> > 2
> >
> > 10/11/26 15:43:53 INFO mapred.MapTask: io.sort.mb = 100
> >
> > 10/11/26 15:43:53 INFO mapred.MapTask: data buffer = 79691776/99614720
> >
> > 10/11/26 15:43:53 INFO mapred.MapTask: record buffer = 262144/327680
> >
> > 10/11/26 15:43:53 WARN mapred.LocalJobRunner: job_local_0001
> >
> > java.lang.ArrayIndexOutOfBoundsException: 1
> >
> >         at
> >
>
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapp
> > er.java:47)
> >
> >         at
> >
>
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapp
> > er.java:31)
> >
> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >
> >         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> >
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >
> >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> >
> > 10/11/26 15:43:54 INFO mapred.JobClient:  map 0% reduce 0%
> >
> > 10/11/26 15:43:54 INFO mapred.JobClient: Job complete: job_local_0001
> >
> > 10/11/26 15:43:54 INFO mapred.JobClient: Counters: 0
> >
> > 10/11/26 15:43:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
> > processName=JobTracker, sessionId= - already initialized
> >
> > 10/11/26 15:43:55 INFO input.FileInputFormat: Total input paths to
> process
> :
> > 2
> >
> > 10/11/26 15:43:55 INFO mapred.JobClient: Running job: job_local_0002
> >
> > 10/11/26 15:43:55 INFO input.FileInputFormat: Total input paths to
> process
> :
> > 2
> >
> > 10/11/26 15:43:56 INFO mapred.MapTask: io.sort.mb = 100
> >
> > 10/11/26 15:43:56 INFO mapred.MapTask: data buffer = 79691776/99614720
> >
> > 10/11/26 15:43:56 INFO mapred.MapTask: record buffer = 262144/327680
> >
> > 10/11/26 15:43:56 WARN mapred.LocalJobRunner: job_local_0002
> >
> > java.lang.NumberFormatException: For input string: "For a young person
> who
> > is years and above and below  years he may be employed in an
> >
> >  industrial undertaking His employer however is required to notify "
> >
> >         at
> >
>
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48
> > )
> >
> >         at java.lang.Long.parseLong(Long.java:410)
> >
> >         at java.lang.Long.parseLong(Long.java:468)
> >
> >         at
> >
>
> org.apache.mahout.cf.taste.hadoop.similarity.item.CountUsersMapper.map(Count
> > UsersMapper.java:40)
> >
> >         at
> >
>
> org.apache.mahout.cf.taste.hadoop.similarity.item.CountUsersMapper.map(Count
> > UsersMapper.java:31)
> >
> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >
> >         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> >
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >
> >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> >
> > 10/11/26 15:43:56 INFO mapred.JobClient:  map 0% reduce 0%
> >
> > 10/11/26 15:43:56 INFO mapred.JobClient: Job complete: job_local_0002
> >
> > 10/11/26 15:43:56 INFO mapred.JobClient: Counters: 0
> >
> > 10/11/26 15:43:56 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
> > processName=JobTracker, sessionId= - already initialized
> >
> > 10/11/26 15:43:57 INFO input.FileInputFormat: Total input paths to
> process
> :
> > 2
> >
> > 10/11/26 15:43:57 INFO mapred.JobClient: Running job: job_local_0003
> >
> > 10/11/26 15:43:57 INFO input.FileInputFormat: Total input paths to
> process
> :
> > 2
> >
> > 10/11/26 15:43:57 INFO mapred.MapTask: io.sort.mb = 100
> >
> > 10/11/26 15:43:57 INFO mapred.MapTask: data buffer = 79691776/99614720
> >
> > 10/11/26 15:43:57 INFO mapred.MapTask: record buffer = 262144/327680
> >
> > 10/11/26 15:43:58 WARN mapred.LocalJobRunner: job_local_0003
> >
> > java.lang.NumberFormatException: For input string: "For a young person
> who
> > is years and above and below  years he may be employed in an
> >
> >  industrial undertaking His employer however is required to notify "
> >
> >         at
> >
>
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48
> > )
> >
> >         at java.lang.Long.parseLong(Long.java:410)
> >
> >         at java.lang.Long.parseLong(Long.java:468)
> >
> >         at
> >
>
> org.apache.mahout.cf.taste.hadoop.ToEntityPrefsMapper.map(ToEntityPrefsMappe
> > r.java:57)
> >
> >         at
> >
>
> org.apache.mahout.cf.taste.hadoop.ToEntityPrefsMapper.map(ToEntityPrefsMappe
> > r.java:30)
> >
> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >
> >         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> >
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >
> >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> >
> > 10/11/26 15:43:58 INFO mapred.JobClient:  map 0% reduce 0%
> >
> > 10/11/26 15:43:58 INFO mapred.JobClient: Job complete: job_local_0003
> >
> > 10/11/26 15:43:58 INFO mapred.JobClient: Counters: 0
> >
> > 10/11/26 15:43:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
> > processName=JobTracker, sessionId= - already initialized
> >
> > 10/11/26 15:43:59 INFO input.FileInputFormat: Total input paths to
> process
> :
> > 0
> >
> > 10/11/26 15:43:59 INFO mapred.LocalJobRunner:
> >
> > 10/11/26 15:43:59 INFO mapred.JobClient: Running job: job_local_0004
> >
> > 10/11/26 15:43:59 INFO input.FileInputFormat: Total input paths to
> process
> :
> > 0
> >
> > 10/11/26 15:43:59 WARN mapred.LocalJobRunner: job_local_0004
> >
> > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
> >
> >         at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> >
> >         at java.util.ArrayList.get(ArrayList.java:322)
> >
> >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:124)
> >
> > 10/11/26 15:44:00 INFO mapred.JobClient:  map 0% reduce 0%
> >
> > 10/11/26 15:44:00 INFO mapred.JobClient: Job complete: job_local_0004
> >
> > 10/11/26 15:44:00 INFO mapred.JobClient: Counters: 0
> >
> > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
> >
> >         at
> >
>
> org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.readIntFromFile(TasteHado
> > opUtils.java:103)
> >
> >         at
> >
>
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(Item
> > SimilarityJob.java:187)
> >
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >
> >         at
> >
>
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(Ite
> > mSimilarityJob.java:92)
> >
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >         at
> >
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > )
> >
> >         at
> >
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> > .java:25)
> >
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >         at
> >
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
> > .java:68)
> >
> >         at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >
> >         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
> >
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >         at
> >
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
> > )
> >
> >         at
> >
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> > .java:25)
> >
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >
> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> >
> >
> >
> >
> > Thanks
> >
> > Regards,
> >
> > Divya
> >
> >
>
>
>