You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Paritosh Ranjan (Commented) (JIRA)" <ji...@apache.org> on 2012/03/23 12:25:28 UTC

[jira] [Commented] (MAHOUT-504) Kmeans clustering error

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236517#comment-13236517 ] 

Paritosh Ranjan commented on MAHOUT-504:
----------------------------------------

The Examples Cluster Reuters is demonstrating the same problem now, due to which the build is failing.
See https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/79/console. I am also attaching some part of the log.

The build passed last time and there has been no code change in between. 

The fifth and sixth line of log shows that the path containing the clusters is being deleted. 

Can anyone think of the reasons behind this uneven failure?

12/03/22 19:20:46 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=[/tmp/mahout-work-hudson/reuters-kmeans-clusters], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/tmp/mahout-work-hudson/reuters-out-seqdir-sparse-kmeans/tfidf-vectors/], --maxIter=[10], --method=[mapreduce], --numClusters=[20], --output=[/tmp/mahout-work-hudson/reuters-kmeans], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
12/03/22 19:20:46 INFO common.HadoopUtil: Deleting /tmp/mahout-work-hudson/reuters-kmeans
12/03/22 19:20:46 INFO common.HadoopUtil: Deleting /tmp/mahout-work-hudson/reuters-kmeans-clusters
12/03/22 19:20:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/03/22 19:20:46 INFO compress.CodecPool: Got brand-new compressor
12/03/22 19:20:47 INFO kmeans.RandomSeedGenerator: Wrote 20 vectors to /tmp/mahout-work-hudson/reuters-kmeans-clusters/part-randomSeed
12/03/22 19:20:47 INFO kmeans.KMeansDriver: Input: /tmp/mahout-work-hudson/reuters-out-seqdir-sparse-kmeans/tfidf-vectors Clusters In: /tmp/mahout-work-hudson/reuters-kmeans-clusters/part-randomSeed Out: /tmp/mahout-work-hudson/reuters-kmeans Distance: org.apache.mahout.common.distance.CosineDistanceMeasure
12/03/22 19:20:47 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
12/03/22 19:20:47 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/03/22 19:20:47 INFO input.FileInputFormat: Total input paths to process : 1
12/03/22 19:20:47 INFO mapred.JobClient: Running job: job_local_0001
12/03/22 19:20:47 INFO mapred.MapTask: io.sort.mb = 100
12/03/22 19:20:48 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/22 19:20:48 INFO mapred.MapTask: record buffer = 262144/327680
12/03/22 19:20:48 INFO compress.CodecPool: Got brand-new decompressor
12/03/22 19:20:48 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:59)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/03/22 19:20:48 INFO mapred.JobClient:  map 0% reduce 0%
12/03/22 19:20:48 INFO mapred.JobClient: Job complete: job_local_0001
12/03/22 19:20:48 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.lang.InterruptedException: K-Means Iteration failed processing /tmp/mahout-work-hudson/reuters-kmeans-clusters/part-randomSeed
	at org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:395)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:339)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:261)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:169)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:119)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:63)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
Build step 'Execute shell' marked build as failure
                
> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira