You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Divya <di...@k2associates.com.sg> on 2010/10/29 10:58:43 UTC
RE: RowSimilarityjob
How can I convert my Sequence<Text,VectorWritable> to
SequenceFile<IntWritable,VectorWritable>
Is there any other way I can parse my documents directory to get vectors and
then get similar documents ?
As I know Rowsimilarityjob would give me similar rows(text terms similar
documents)
Am I correct ?
-----Original Message-----
From: Sebastian Schelter [mailto:ssc@apache.org]
Sent: Friday, October 29, 2010 3:45 PM
To: user@mahout.apache.org
Subject: Re: RowSimilarityjob
+user
-dev
The input files need to be SequenceFile<IntWritable,VectorWritable>.
RowSimilarityJob is intended to become a method invokable on
DistributedRowMatrix as soon as that is ported to the new hadoop api.
--sebastian
Am 29.10.2010 08:33, schrieb Divya:
> Hi,
>
>
>
> What will be the input to RowSimilarityJob ?
>
>
>
> When I passed tfidf-vectors files as input parameter
>
> I got following error
>
>
>
> Oct 29, 2010 2:21:35 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
>
> WARNING: job_local_0001
>
> java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
> org.apache.hadoop.io.IntWritable
>
> at
>
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob$RowWeightMapper.ma
> p(RowSimilarityJob.java:1)
>
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>
> at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>
> Oct 29, 2010 2:21:36 PM org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
>
> INFO: Job complete: job_local_0001
>
> Oct 29, 2010 2:21:36 PM org.apache.hadoop.mapred.Counters log
>
> INFO: Counters: 0
>
> Oct 29, 2010 2:21:36 PM org.apache.hadoop.metrics.jvm.JvmMetrics init
>
> INFO: Cannot initialize JVM Metrics with processName=JobTracker,
sessionId=
> - already initialized
>
> Oct 29, 2010 2:21:36 PM org.apache.hadoop.mapred.JobClient
> configureCommandLineOptions
>
> WARNING: No job jar file set. User classes may not be found. See
> JobConf(Class) or JobConf#setJar(String).
>
> Oct 29, 2010 2:21:36 PM
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
>
> INFO: Total input paths to process : 0
>
> Oct 29, 2010 2:21:36 PM org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
>
> INFO: Running job: job_local_0002
>
> Oct 29, 2010 2:21:36 PM
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus
>
> INFO: Total input paths to process : 0
>
> Oct 29, 2010 2:21:36 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
>
> WARNING: job_local_0002
>
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>
> at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>
> at java.util.ArrayList.get(ArrayList.java:322)
>
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:124)
>
> Oct 29, 2010 2:21:37 PM org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
>
> INFO: map 0% reduce 0%
>
> Oct 29, 2010 2:21:37 PM org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
>
> INFO: Job complete: job_local_0002
>
> Oct 29, 2010 2:21:37 PM org.apache.hadoop.mapred.Counters log
>
> INFO: Counters: 0
>
> Oct 29, 2010 2:21:37 PM org.apache.hadoop.metrics.jvm.JvmMetrics init
>
> INFO: Cannot initialize JVM Metrics with processName=JobTracker,
sessionId=
> - already initialized
>
> Oct 29, 2010 2:21:38 PM org.apache.hadoop.mapred.JobClient
> configureCommandLineOptions
>
> WARNING: No job jar file set. User classes may not be found. See
> JobConf(Class) or JobConf#setJar(String).
>
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does
> not exist: temp/pairwiseSimilarity
>
> at
>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFo
> rmat.java:224)
>
> at
>
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(Seq
> uenceFileInputFormat.java:55)
>
> at
>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFor
> mat.java:241)
>
> at
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
>
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
>
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>
> at
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>
> at
>
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.run(RowSimilarityJ
> ob.java:174)
>
> at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
> at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>
> at
>
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.main(RowSimilarity
> Job.java:86)
>
>
>
>
>
>
>
> Its creating temp/weights directory but it is empty
>
> and its not at all creating pairwiseSimilarity
>
> so the other part of error I can figure it out..
>
> but why java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be
> cast to org.apache.hadoop.io.IntWritable
>
> Unable to find out L
>
>
>
> Wondering whether my input is correct or not ?
>
>
>
>
>
>
>
>
>
> Regards,
>
> Divya
>
>
>
>