You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2013/06/09 16:46:20 UTC
[jira] [Commented] (MAHOUT-1247) cluster-reuters doesn't work on
Hadoop
[ https://issues.apache.org/jira/browse/MAHOUT-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679074#comment-13679074 ]
Grant Ingersoll commented on MAHOUT-1247:
-----------------------------------------
Here's the first error I'm getting: https://paste.apache.org/cik6
{quote}
java.lang.IllegalStateException: /tmp/hadoop-grantingersoll/mapred/local/taskTracker/distcache/4475940891381251304_1262960862_693852121/localhostdicVec/dictionary.file-0
at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63)
at org.apache.mahout.vectorizer.term.TFPartialVectorReducer.setup(TFPartialVectorReducer.java:146)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/tmp/hadoop-grantingersoll/mapred/local/taskTracker/distcache/4475940891381251304_1262960862_693852121/localhostdicVec/dictionary.file-0
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:528)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:796)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61)
... 9 more
{quote}
Might be related to MAHOUT-992, but not sure. I added a main to DictionaryVectorizer that allows you to reproduce this off of the prior run of cluster-reuters without having to go re-run everything.
> cluster-reuters doesn't work on Hadoop
> --------------------------------------
>
> Key: MAHOUT-1247
> URL: https://issues.apache.org/jira/browse/MAHOUT-1247
> Project: Mahout
> Issue Type: Bug
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Fix For: 0.8
>
>
> At least two issues:
> 1. MAHOUT-992 messed up the Distributed Cache stuff somehow
> 2. The ExtractReuters data is not being moved to HDFS.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira