You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Grant Ingersoll (JIRA)" <ji...@apache.org> on 2013/06/09 17:48:20 UTC
[jira] [Comment Edited] (MAHOUT-1247) cluster-reuters doesn't work
on Hadoop
[ https://issues.apache.org/jira/browse/MAHOUT-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679090#comment-13679090 ]
Grant Ingersoll edited comment on MAHOUT-1247 at 6/9/13 3:47 PM:
-----------------------------------------------------------------
I think I see the issue. The cache file is "local", the Iterator, however, has a Hadoop conf that is expecting an HDFS file, hence it can't find it.
For instance, the logs show:
{quote}11:38:49,638 INFO org.apache.mahout.vectorizer.term.TFPartialVectorReducer: Cache Files: [/tmp/hadoop-grantingersoll/mapred/local/taskTracker/distcache/2677051046998143225_1262960862_697707077/localhostdicVec/dictionary.file-0]
2013{quote}
Notice it is missing the scheme. Going to try explicitly setting the scheme to file://
was (Author: gsingers):
I think I see the issue. The cache file is "local", the Iterator, however, has a Hadoop conf that is expecting an HDFS file, hence it can't find it.
> cluster-reuters doesn't work on Hadoop
> --------------------------------------
>
> Key: MAHOUT-1247
> URL: https://issues.apache.org/jira/browse/MAHOUT-1247
> Project: Mahout
> Issue Type: Bug
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Fix For: 0.8
>
>
> At least two issues:
> 1. MAHOUT-992 messed up the Distributed Cache stuff somehow
> 2. The ExtractReuters data is not being moved to HDFS.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Comment Edited] (MAHOUT-1247) cluster-reuters doesn't
work on Hadoop
Posted by Sebastian Schelter <ss...@googlemail.com>.
A makeQualified call should help in case the file is not found:
LocalFileSystem localFs = FileSystem.getLocal(conf);
Path localCacheFile = localFs.makeQualified(localFiles[0]);
if you run in local mode (e.g. not on a cluster), you could have to use
a fallback to directly load the file, as it is done in
org.apache.mahout.cf.taste.hadoop.als.ALS#readMatrixByRowsFromDistributedCache
Best,
Sebastian
On 09.06.2013 17:48, Grant Ingersoll (JIRA) wrote:
>
> [ https://issues.apache.org/jira/browse/MAHOUT-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679090#comment-13679090 ]
>
> Grant Ingersoll edited comment on MAHOUT-1247 at 6/9/13 3:47 PM:
> -----------------------------------------------------------------
>
> I think I see the issue. The cache file is "local", the Iterator, however, has a Hadoop conf that is expecting an HDFS file, hence it can't find it.
>
> For instance, the logs show:
> {quote}11:38:49,638 INFO org.apache.mahout.vectorizer.term.TFPartialVectorReducer: Cache Files: [/tmp/hadoop-grantingersoll/mapred/local/taskTracker/distcache/2677051046998143225_1262960862_697707077/localhostdicVec/dictionary.file-0]
> 2013{quote}
>
> Notice it is missing the scheme. Going to try explicitly setting the scheme to file://
>
> was (Author: gsingers):
> I think I see the issue. The cache file is "local", the Iterator, however, has a Hadoop conf that is expecting an HDFS file, hence it can't find it.
>
>> cluster-reuters doesn't work on Hadoop
>> --------------------------------------
>>
>> Key: MAHOUT-1247
>> URL: https://issues.apache.org/jira/browse/MAHOUT-1247
>> Project: Mahout
>> Issue Type: Bug
>> Reporter: Grant Ingersoll
>> Assignee: Grant Ingersoll
>> Fix For: 0.8
>>
>>
>> At least two issues:
>> 1. MAHOUT-992 messed up the Distributed Cache stuff somehow
>> 2. The ExtractReuters data is not being moved to HDFS.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>