You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Cyril Bogus <cy...@gmail.com> on 2013/04/11 21:25:00 UTC

KMeans

Hi everyone,

Running Hadoop 1.0.4 with Mahout 0.7

I am currently trying to run a kmeans job on some data that I stored in
hdfs.

I already ran a canopy clustering to get initial clusters and it runs fine.
Now I am trying to do the kmeans and get the errors bellow.

My vectors are NamedVector(DenseVector,String)

Also when I check for the output directory I have a duplicate between the
one in hdfs and in the Java Program project's path. But from the error, the
kmeans is reading input from the right location.

13/04/11 15:09:58 INFO kmeans.KMeansDriver: Input: datafileimporter/data
Clusters In: datafileimporter/clusters/clusters-0-final/part-r-00000 Out:
datafileimporter/kmeans Distance:
org.apache.mahout.common.distance.EuclideanDistanceMeasure
13/04/11 15:09:58 INFO kmeans.KMeansDriver: convergence: 0.01 max
Iterations: 20 num Reduce Tasks: org.apache.mahout.math.VectorWritable
Input Vectors: {}
13/04/11 15:09:58 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Cluster Iterator running iteration 1 over priorPath:
datafileimporter/kmeans/clusters-0
13/04/11 15:09:58 INFO input.FileInputFormat: Total input paths to process
: 1
13/04/11 15:09:59 INFO mapred.JobClient: Running job: job_201304111243_0022
13/04/11 15:10:00 INFO mapred.JobClient:  map 0% reduce 0%
13/04/11 15:10:14 INFO mapred.JobClient: Task Id :
attempt_201304111243_0022_m_000000_0, Status : FAILED
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:571)
    at java.util.ArrayList.get(ArrayList.java:349)
    at
org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:215)
    at
org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:36)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201304111243_0022_m_000000_0: SLF4J: Class path contains multiple
SLF4J bindings.
attempt_201304111243_0022_m_000000_0: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_0: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_0: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_0: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_0: SLF4J: See
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/04/11 15:10:20 INFO mapred.JobClient: Task Id :
attempt_201304111243_0022_m_000000_1, Status : FAILED
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:571)
    at java.util.ArrayList.get(ArrayList.java:349)
    at
org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:215)
    at
org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:36)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201304111243_0022_m_000000_1: SLF4J: Class path contains multiple
SLF4J bindings.
attempt_201304111243_0022_m_000000_1: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_1: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_1: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_1: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_1: SLF4J: See
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/04/11 15:10:26 INFO mapred.JobClient: Task Id :
attempt_201304111243_0022_m_000000_2, Status : FAILED
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:571)
    at java.util.ArrayList.get(ArrayList.java:349)
    at
org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:215)
    at
org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:36)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201304111243_0022_m_000000_2: SLF4J: Class path contains multiple
SLF4J bindings.
attempt_201304111243_0022_m_000000_2: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_2: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_2: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_2: SLF4J: Found binding in
[jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201304111243_0022_m_000000_2: SLF4J: See
http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/04/11 15:10:38 INFO mapred.JobClient: Job complete: job_201304111243_0022
13/04/11 15:10:38 INFO mapred.JobClient: Counters: 7
13/04/11 15:10:38 INFO mapred.JobClient:   Job Counters
13/04/11 15:10:38 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=26654
13/04/11 15:10:38 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
13/04/11 15:10:38 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
13/04/11 15:10:38 INFO mapred.JobClient:     Launched map tasks=4
13/04/11 15:10:38 INFO mapred.JobClient:     Data-local map tasks=4
13/04/11 15:10:38 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/04/11 15:10:38 INFO mapred.JobClient:     Failed map tasks=1
java.lang.InterruptedException: Cluster Iteration 1 failed processing
datafileimporter/kmeans/clusters-1
    at
org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:186)
    at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:229)
    at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:149)
    at DataImporter.main(DataImporter.java:64)

Best Regards.