You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2011/08/01 15:36:56 UTC

Re: seq2sparse issue?

Seems strange that if you are running on your cluster that there refs to LocalJobRunner and RawLocalFileSystem.  

On Jul 29, 2011, at 11:04 PM, Jake Mannix wrote:

> Not sure if this is something with my prod cluster, or a bug, but when
> running seq2sparse on my production hadoop cluster, I keep making it all the
> way through the tokenization, dictionary creation, etc, but then the
> TFPartialVectorReducer blows up:
> 
> 11/07/30 06:00:04 INFO mapred.LocalJobRunner:
> 11/07/30 06:00:04 INFO mapred.TaskRunner: Task
> 'attempt_local_0003_m_000003_0' done.
> 11/07/30 06:00:04 INFO mapred.LocalJobRunner:
> 11/07/30 06:00:04 INFO mapred.Merger: Merging 4 sorted segments
> 11/07/30 06:00:04 INFO mapred.Merger: Down to the last merge-pass, with 4
> segments left of total size: 243328920 bytes
> 11/07/30 06:00:04 INFO mapred.LocalJobRunner:
> 11/07/30 06:00:04 WARN mapred.LocalJobRunner: job_local_0003
> java.lang.IllegalStateException: /user/jake/status_parsed/dictionary.file-0
> at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63)
> at
> org.apache.mahout.vectorizer.term.TFPartialVectorReducer.setup(TFPartialVectorReducer.java:130)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
> at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:215)
> Caused by: java.io.FileNotFoundException: File
> file:/user/jake/status_parsed/dictionary.file-0 does not exist.
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:372)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:718)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
> at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58)
> at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61)
> ... 5 more
> 11/07/30 06:00:04 INFO mapred.JobClient: Job complete: job_local_0003
> 
> 
> The file listed (without a filesystem uri!)
> "/user/jake/status_parsed/dictionary.file-0" exists on the cluster, but it's
> probably not showing up in the DistributedCache properly somehow.
> 
> Anyone run into anything like this before?  It's been a while since I've run
> seq2sparse on a real-hardware / managed cluster, not sure if it's me, or
> mahout, or a configuration setting somehow.
> 
>  -jake

--------------------------------------------
Grant Ingersoll