You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2015/08/12 01:48:49 UTC

[jira] [Resolved] (MAHOUT-1658) Kmeans fails when running on HDFS

     [ https://issues.apache.org/jira/browse/MAHOUT-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi resolved MAHOUT-1658.
-----------------------------------
    Resolution: Cannot Reproduce

Not able to reproduce, please open a fresh jira if the problem recurs with 0.11.0.

> Kmeans fails when running on HDFS
> ---------------------------------
>
>                 Key: MAHOUT-1658
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1658
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.9
>         Environment: CentOS 6.6 with HDP 2.2
>            Reporter: Ha Son Hai
>            Assignee: Andrew Musselman
>              Labels: hadoop
>             Fix For: 0.11.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Hi,
> I was trying to run some examples of mahout on a hadoop platform and saw that when kmeans running in local host, it returned successfully. However, when it ran with HDFS, mahout looked for the intermediate results on localhost instead on HDFS if we use relative path.
> I have to use absolute path of the input and output if I want kmeans to run correctly.
> Here is an typical error when running on HDFS:
> 15/03/26 12:15:07 INFO mapreduce.Job: Task Id : attempt_1426848955524_0062_m_000000_2, Status : FAILED
> Error: java.lang.IllegalStateException: output/clusters-0
>         at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
>         at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
>         at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:415)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
>         at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:376)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
>         at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:570)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
>         at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
>         at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
>         ... 10 more
> 15/03/26 12:15:16 INFO mapreduce.Job:  map 100% reduce 0%
> 15/03/26 12:15:17 INFO mapreduce.Job:  map 100% reduce 100%
> 15/03/26 12:15:17 INFO mapreduce.Job: Job job_1426848955524_0062 failed with state FAILED due to: Task failed task_1426848955524_0062_m_000000
> Job failed as tasks failed. failedMaps:1 failedReduces:0
> 15/03/26 12:15:17 INFO mapreduce.Job: Counters: 9
>         Job Counters
>                 Failed map tasks=4
>                 Launched map tasks=4
>                 Other local map tasks=3
>                 Rack-local map tasks=1
>                 Total time spent by all maps in occupied slots (ms)=23087
>                 Total time spent by all reduces in occupied slots (ms)=0
>                 Total time spent by all map tasks (ms)=23087
>                 Total vcore-seconds taken by all map tasks=23087
>                 Total megabyte-seconds taken by all map tasks=23641088
> Exception in thread "main" java.lang.InterruptedException: Cluster Iteration 1 failed processing output/clusters-1
>         at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:183)
>         at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:224)
>         at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
>         at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:135)
>         at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:60)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>         at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)