You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2013/06/09 16:46:20 UTC

[jira] [Commented] (MAHOUT-1247) cluster-reuters doesn't work on Hadoop

    [ https://issues.apache.org/jira/browse/MAHOUT-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679074#comment-13679074 ] 

Grant Ingersoll commented on MAHOUT-1247:
-----------------------------------------

Here's the first error I'm getting: https://paste.apache.org/cik6
{quote}
java.lang.IllegalStateException: /tmp/hadoop-grantingersoll/mapred/local/taskTracker/distcache/4475940891381251304_1262960862_693852121/localhostdicVec/dictionary.file-0
	at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63)
	at org.apache.mahout.vectorizer.term.TFPartialVectorReducer.setup(TFPartialVectorReducer.java:146)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/tmp/hadoop-grantingersoll/mapred/local/taskTracker/distcache/4475940891381251304_1262960862_693852121/localhostdicVec/dictionary.file-0
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:528)
	at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:796)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
	at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58)
	at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61)
	... 9 more
{quote}

Might be related to MAHOUT-992, but not sure.  I added a main to DictionaryVectorizer that allows you to reproduce this off of the prior run of cluster-reuters without having to go re-run everything.
                
> cluster-reuters doesn't work on Hadoop
> --------------------------------------
>
>                 Key: MAHOUT-1247
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1247
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 0.8
>
>
> At least two issues:
> 1. MAHOUT-992 messed up the Distributed Cache stuff somehow
> 2. The ExtractReuters data is not being moved to HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira