You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Ha Son Hai (JIRA)" <ji...@apache.org> on 2015/03/26 15:54:52 UTC
[jira] [Created] (MAHOUT-1658) Kmeans fails when running on HDFS

Ha Son Hai created MAHOUT-1658:
----------------------------------

             Summary: Kmeans fails when running on HDFS
                 Key: MAHOUT-1658
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1658
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.9
         Environment: CentOS 6.6 with HDP 2.2
            Reporter: Ha Son Hai


Hi,
I was trying to run some example of mahout on an hadoop platform and saw that when kmeans running in local host, it return success. However, when it run with HDFS, the mahout look for the intermediate result on local host instead of on HDFS if we use relative path.
I have to use absolute path of the input and output if I want kmeans to run correctly.

Here is an typical error when running on HDFS:

15/03/26 12:15:07 INFO mapreduce.Job: Task Id : attempt_1426848955524_0062_m_000000_2, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
        at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
        at org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:415)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:376)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
        at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:570)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
        at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
        ... 10 more

15/03/26 12:15:16 INFO mapreduce.Job:  map 100% reduce 0%
15/03/26 12:15:17 INFO mapreduce.Job:  map 100% reduce 100%
15/03/26 12:15:17 INFO mapreduce.Job: Job job_1426848955524_0062 failed with state FAILED due to: Task failed task_1426848955524_0062_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/03/26 12:15:17 INFO mapreduce.Job: Counters: 9
        Job Counters
                Failed map tasks=4
                Launched map tasks=4
                Other local map tasks=3
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=23087
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=23087
                Total vcore-seconds taken by all map tasks=23087
                Total megabyte-seconds taken by all map tasks=23641088
Exception in thread "main" java.lang.InterruptedException: Cluster Iteration 1 failed processing output/clusters-1
        at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:183)
        at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:224)
        at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
        at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:135)
        at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:60)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)