You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2010/09/25 09:57:33 UTC

[jira] Reopened: (MAHOUT-504) Kmeans clustering error

     [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen reopened MAHOUT-504:
------------------------------


> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Reopened: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
  This cannot be running on the latest trunk. The job no longer has a -c 
argument and the initial clusters are always computed by running Canopy 
on the converted data. It is meant to be run with no arguments; default 
values are provided (EuclideanDM, 80, 55) that work consistently. The 
only variables are the distance measure, t1 and t2 values for Canopy. If 
these are changed there will be somewhere between 1 and 600 clusters 
generated by Canopy and k-Means processes them fine.

Predictably, when I run with t1=800 and t2=550 I get a single cluster 
out; with t1=8 and t2=5.5 I get 600. There is no way I can imagine to 
ever get 0 clusters out of Canopy.

I think this has been fixed, but show me a command line that can 
generate this error and I will have something to work with.


On 9/25/10 3:57 AM, Sean Owen (JIRA) wrote:
>       [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Sean Owen reopened MAHOUT-504:
> ------------------------------
>
>
>> Kmeans clustering error
>> -----------------------
>>
>>                  Key: MAHOUT-504
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>              Project: Mahout
>>           Issue Type: Bug
>>             Reporter: Zhen Guo
>>             Assignee: Robin Anil
>>              Fix For: 0.4
>>
>>
>> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)