You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Paritosh Ranjan <pr...@xebia.com> on 2012/03/01 05:17:45 UTC

Re: Getting error while running k-means clustering

I think you are trying to run the example given in quickstart section.
It says "Finally, run kMeans with 20 clusters." Which is specified by 
your -k 20 attribute.

There are two ways you can run K-Means:
a) One by providing initial clusters, which is done by passing -c argument.
b) Another by specifying initial number of clusters, -k.

You are using both (-k, -c ), using just one of them will do.

You will  either have to give initial cluster Centroids i.e. -c ( which 
can be generated by Canopy Algorithm 
https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering ),
or, just provide -k = 20 ( initial number of randomly generated clusters ).

On 29-02-2012 23:52, manish dunani wrote:
> Hi,
> I am doing k-means clustering on hadoop cluster using<a href="
> https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering
> ">link</a>.
>
> during run of k-means clustering on hadoop using following command  i got
> error like:
>
> hduser@ubuntu:/opt/mahout$ bin/mahout kmeans -i
> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -k
> 20 -ow
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
> HADOOP_CONF_DIR=/usr/local/hadoop/conf
> MAHOUT-JOB: /opt/mahout/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar
> 12/02/29 12:42:23 INFO common.AbstractJob: Command line arguments:
> {--clusters=[./examples/bin/work/clusters], --convergenceDelta=[0.5],
> --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure],
> --endPhase=[2147483647],
> --input=[./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/],
> --maxIter=[10], --method=[mapreduce], --numClusters=[20],
> --output=[./examples/bin/work/reuters-kmeans], --overwrite=null,
> --startPhase=[0], --tempDir=[temp]}
> 12/02/29 12:42:23 INFO common.HadoopUtil: Deleting
> examples/bin/work/reuters-kmeans
> 12/02/29 12:42:23 INFO common.HadoopUtil: Deleting
> examples/bin/work/clusters
> 12/02/29 12:42:24 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 12/02/29 12:42:24 INFO zlib.ZlibFactory: Successfully loaded&  initialized
> native-zlib library
> 12/02/29 12:42:24 INFO compress.CodecPool: Got brand-new compressor
> 12/02/29 12:42:24 INFO kmeans.RandomSeedGenerator: Wrote 20 vectors to
> examples/bin/work/clusters/part-randomSeed
> 12/02/29 12:42:24 INFO kmeans.KMeansDriver: Input:
> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
> examples/bin/work/clusters/part-randomSeed Out:
> examples/bin/work/reuters-kmeans Distance:
> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
> 12/02/29 12:42:24 INFO kmeans.KMeansDriver: convergence: 0.5 max
> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
> Input Vectors: {}
> 12/02/29 12:42:24 INFO kmeans.KMeansDriver: K-Means Iteration 1
> 12/02/29 12:42:26 INFO input.FileInputFormat: Total input paths to process
> : 1
> 12/02/29 12:42:27 INFO mapred.JobClient: Running job: job_201202290930_0012
> 12/02/29 12:42:28 INFO mapred.JobClient:  map 0% reduce 0%
> 12/02/29 12:42:42 INFO mapred.JobClient: Task Id :
> attempt_201202290930_0012_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
>      at
> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
>      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>      at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 12/02/29 12:42:48 INFO mapred.JobClient: Task Id :
> attempt_201202290930_0012_m_000000_1, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
>
>      at
> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
>      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>      at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 12/02/29 12:42:54 INFO mapred.JobClient: Task Id :
> attempt_201202290930_0012_m_000000_2, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
>
>      at
> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
>      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>      at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 12/02/29 12:43:03 INFO mapred.JobClient: Job complete: job_201202290930_0012
> 12/02/29 12:43:03 INFO mapred.JobClient: Counters: 3
> 12/02/29 12:43:03 INFO mapred.JobClient:   Job Counters
> 12/02/29 12:43:03 INFO mapred.JobClient:     Launched map tasks=4
> 12/02/29 12:43:03 INFO mapred.JobClient:     Data-local map tasks=4
> 12/02/29 12:43:03 INFO mapred.JobClient:     Failed map tasks=1
> Exception in thread "main" java.lang.InterruptedException: K-Means
> Iteration failed processing examples/bin/work/clusters/part-randomSeed
>      at
> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:373)
>      at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:317)
>      at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:239)
>      at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:154)
>      at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:112)
>      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>      at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:61)
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>      at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>      at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:616)
>      at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>      at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>      at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>      at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>      at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:616)
>      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
> I also created "clusters"directory in /opt/mahout/eamples/work.
>
> Then after i got the same error .
>
>
> What to do to solve the error?
>
>