You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Zhen Guo (JIRA)" <ji...@apache.org> on 2010/09/20 21:47:33 UTC

[jira] Created: (MAHOUT-504) Kmeans clustering error

Kmeans clustering error
-----------------------

                 Key: MAHOUT-504
                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
             Project: Mahout
          Issue Type: Bug
            Reporter: Zhen Guo


I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.

10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Created: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <je...@Narus.com>.

The input mapper doesn't like the format of your data

Sent from my iPhone

On Sep 29, 2011, at 7:14 AM, "ww107826" <16...@qq.com> wrote:

> hello
>    I have a same question.
> 11/09/29 10:29:51 INFO mapred.JobClient:  map 6% reduce 0%
> 11/09/29 10:29:54 INFO mapred.JobClient:  map 44% reduce 0%
> 11/09/29 10:29:59 INFO mapred.JobClient: Task Id :
> attempt_201109291022_0001_m_000001_0, Status : FAILED
> java.lang.NumberFormatException: multiple points
>    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1082)
>    at java.lang.Double.valueOf(Double.java:475)
>    at
> org.apache.mahout.clustering.conversion.InputMapper.map(InputMapper.java:48)
>    at
> org.apache.mahout.clustering.conversion.InputMapper.map(InputMapper.java:34)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>    at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 11/09/29 15:16:45 INFO mapred.JobClient: Task Id :
> attempt_201109291420_0020_m_000000_0, Status : FAILED
> java.lang.IllegalStateException:

Re: [jira] Created: (MAHOUT-504) Kmeans clustering error

Posted by ww107826 <16...@qq.com>.

hello
    I have a same question.
11/09/29 10:29:51 INFO mapred.JobClient:  map 6% reduce 0%
11/09/29 10:29:54 INFO mapred.JobClient:  map 44% reduce 0%
11/09/29 10:29:59 INFO mapred.JobClient: Task Id :
attempt_201109291022_0001_m_000001_0, Status : FAILED
java.lang.NumberFormatException: multiple points
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1082)
	at java.lang.Double.valueOf(Double.java:475)
	at
org.apache.mahout.clustering.conversion.InputMapper.map(InputMapper.java:48)
	at
org.apache.mahout.clustering.conversion.InputMapper.map(InputMapper.java:34)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
11/09/29 15:16:45 INFO mapred.JobClient: Task Id :
attempt_201109291420_0020_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-09-29 15:24:40,505 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2011-09-29 15:24:40,848 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
2011-09-29 15:24:40,875 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
cleanup for the task

I do not how to solve this .canyou tell me? thank you. 

--
View this message in context: http://lucene.472066.n3.nabble.com/jira-Created-MAHOUT-504-Kmeans-clustering-error-tp1531052p3379348.html
Sent from the Mahout Developer List mailing list archive at Nabble.com.

Re: [jira] Created: (MAHOUT-504) Kmeans clustering error

Posted by ww107826 <16...@qq.com>.

I have the same quetion,can you tell me how to sovle it.

--
View this message in context: http://lucene.472066.n3.nabble.com/jira-Created-MAHOUT-504-Kmeans-clustering-error-tp1531052p3379359.html
Sent from the Mahout Developer List mailing list archive at Nabble.com.

Re: [jira] Created: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  That error is thrown when the mapper is initialized and finds no 
initial clusters (The error message should say "No clusters found"). 
Check your command line -c argument. It should name the directory 
containing the initial clusters (output/clusters-0 if you used canopy to 
produce them). Please post your exact command line arguments if you 
still have problems, and I will help you debug them. K-Means has been 
pretty well tested in some production environments and errors are 
usually caused by incorrect arguments.

On 9/20/10 3:47 PM, Zhen Guo (JIRA) wrote:
> Kmeans clustering error
> -----------------------
>
>                   Key: MAHOUT-504
>                   URL: https://issues.apache.org/jira/browse/MAHOUT-504
>               Project: Mahout
>            Issue Type: Bug
>              Reporter: Zhen Guo
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>

[jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918174#action_12918174 ] 

Hudson commented on MAHOUT-504:
-------------------------------

Integrated in Mahout-Quality #375 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/375/])
    MAHOUT-504: reworded error message in cluster mapper for clarity


> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915344#action_12915344 ] 

Hudson commented on MAHOUT-504:
-------------------------------

Integrated in Mahout-Quality #339 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/339/])
    MAHOUT-504: improved error message in Fuzzy k-Means


> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAHOUT-504) Kmeans clustering error

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-504.
------------------------------

    Fix Version/s: 0.4
                       (was: 0.5)
       Resolution: Fixed

> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  +user@

Thanks so much Pragnesh, for putting your finger so succinctly on the 
problem. I'm cross-posting this to user@ so that it will be part of that 
searchable archive too. I will also append to MAHOUT-504.

I'm glad to hear you are out of the woods on this,
Jeff


On 10/8/10 2:02 AM, pragnesh radadia wrote:
> finally I am able to run kmean example of Clustering of synthetic control data.
>
> I think problem is "hadoop is running as hadoop user(using cloudera
> cdh3) and I am trying to run example as pragnesh user"
>
> so hadoop is not able find the under "/user/hadoop"
>
> since example is using relative path to store the input and clustering data.
>
> -pragnesh
>
>
> On Wed, Oct 6, 2010 at 12:27 AM, Jeff Eastman
> <jd...@windwardsolutions.com>  wrote:
>>   Hi Pragnesh,
>>
>> I really don't know what to suggest to you. I just did a new Mahout checkout
>> and build, followed by uploading the synthetic_control.data file to a local
>> Hadoop instance. The k-means job ran without incident. On a hunch, I also
>> uploaded the file as testdata (not in directory testdata) and that worked
>> too. I'm baffled why I can't duplicate this and suspect it is a local system
>> issue. What OS are you running?
>>
>> If yours works from Eclipse but not from the command line, I wonder if you
>> have done mvn clean build from the command line before you ran the CLI
>> Mahout job? Eclipse compiles its bits int
>
> o different directories and does
>> not build the necessary job files. Other than that, I suggest checking your
>> file system groups and permissions.
>>
>> If you find something that gets you running again, *please* post your
>> solution so we can advise others who are experiencing the same error
>> message.
>>
>>
>> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>>      [
>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502
>>> ]
>>>
>>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>>> ----------------------------------------------------------
>>>
>>> i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>
>>> this run fine from eclipse
>>>
>>> but when i try to run from command line with hadoop. i see following
>>> output.
>>>
>>> while  $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>>> without any error.
>>>
>>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>>> classpath, will use command-line arguments only
>>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>>> job_201010051117_0005
>>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0005
>>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>>> output/data Out: output Measure:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0
>>> t2: 55.0
>>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>>> job_201010051117_0006
>>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0006
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-0 Out: output Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>>> job_201010051117_0007
>>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0007
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>>> Vectors: org.apache.mahout.math.VectorWritable
>>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>>> job_201010051117_0008
>>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0008
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>>
>>>        was (Author: pgradadia):
>>>      i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>> Kmeans clustering error
>>>> -----------------------
>>>>
>>>>                  Key: MAHOUT-504
>>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>              Project: Mahout
>>>>           Issue Type: Bug
>>>>             Reporter: Zhen Guo
>>>>             Assignee: Robin Anil
>>>>              Fix For: 0.4
>>>>
>>>>
>>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>>> Mapper. I am using Trunk.
>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>         at
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  +user@

Thanks so much Pragnesh, for putting your finger so succinctly on the 
problem. I'm cross-posting this to user@ so that it will be part of that 
searchable archive too. I will also append to MAHOUT-504.

I'm glad to hear you are out of the woods on this,
Jeff


On 10/8/10 2:02 AM, pragnesh radadia wrote:
> finally I am able to run kmean example of Clustering of synthetic control data.
>
> I think problem is "hadoop is running as hadoop user(using cloudera
> cdh3) and I am trying to run example as pragnesh user"
>
> so hadoop is not able find the under "/user/hadoop"
>
> since example is using relative path to store the input and clustering data.
>
> -pragnesh
>
>
> On Wed, Oct 6, 2010 at 12:27 AM, Jeff Eastman
> <jd...@windwardsolutions.com>  wrote:
>>   Hi Pragnesh,
>>
>> I really don't know what to suggest to you. I just did a new Mahout checkout
>> and build, followed by uploading the synthetic_control.data file to a local
>> Hadoop instance. The k-means job ran without incident. On a hunch, I also
>> uploaded the file as testdata (not in directory testdata) and that worked
>> too. I'm baffled why I can't duplicate this and suspect it is a local system
>> issue. What OS are you running?
>>
>> If yours works from Eclipse but not from the command line, I wonder if you
>> have done mvn clean build from the command line before you ran the CLI
>> Mahout job? Eclipse compiles its bits int
>
> o different directories and does
>> not build the necessary job files. Other than that, I suggest checking your
>> file system groups and permissions.
>>
>> If you find something that gets you running again, *please* post your
>> solution so we can advise others who are experiencing the same error
>> message.
>>
>>
>> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>>      [
>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502
>>> ]
>>>
>>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>>> ----------------------------------------------------------
>>>
>>> i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>
>>> this run fine from eclipse
>>>
>>> but when i try to run from command line with hadoop. i see following
>>> output.
>>>
>>> while  $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>>> without any error.
>>>
>>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>>> classpath, will use command-line arguments only
>>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>>> job_201010051117_0005
>>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0005
>>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>>> output/data Out: output Measure:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0
>>> t2: 55.0
>>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>>> job_201010051117_0006
>>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0006
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-0 Out: output Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>>> job_201010051117_0007
>>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0007
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>>> Vectors: org.apache.mahout.math.VectorWritable
>>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>>> job_201010051117_0008
>>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0008
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>>
>>>        was (Author: pgradadia):
>>>      i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>> Kmeans clustering error
>>>> -----------------------
>>>>
>>>>                  Key: MAHOUT-504
>>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>              Project: Mahout
>>>>           Issue Type: Bug
>>>>             Reporter: Zhen Guo
>>>>             Assignee: Robin Anil
>>>>              Fix For: 0.4
>>>>
>>>>
>>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>>> Mapper. I am using Trunk.
>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>         at
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by pragnesh radadia <pr...@gmail.com>.

finally I am able to run kmean example of Clustering of synthetic control data.

I think problem is "hadoop is running as hadoop user(using cloudera
cdh3) and I am trying to run example as pragnesh user"

so hadoop is not able find the under "/user/hadoop"

since example is using relative path to store the input and clustering data.

-pragnesh


On Wed, Oct 6, 2010 at 12:27 AM, Jeff Eastman
<jd...@windwardsolutions.com> wrote:
>  Hi Pragnesh,
>
> I really don't know what to suggest to you. I just did a new Mahout checkout
> and build, followed by uploading the synthetic_control.data file to a local
> Hadoop instance. The k-means job ran without incident. On a hunch, I also
> uploaded the file as testdata (not in directory testdata) and that worked
> too. I'm baffled why I can't duplicate this and suspect it is a local system
> issue. What OS are you running?
>
> If yours works from Eclipse but not from the command line, I wonder if you
> have done mvn clean build from the command line before you ran the CLI
> Mahout job? Eclipse compiles its bits int


o different directories and does
> not build the necessary job files. Other than that, I suggest checking your
> file system groups and permissions.
>
> If you find something that gets you running again, *please* post your
> solution so we can advise others who are experiencing the same error
> message.
>
>
> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>
>>     [
>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502
>> ]
>>
>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>> ----------------------------------------------------------
>>
>> i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>
>> this run fine from eclipse
>>
>> but when i try to run from command line with hadoop. i see following
>> output.
>>
>> while  $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>> without any error.
>>
>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>> classpath, will use command-line arguments only
>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>> job_201010051117_0005
>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>> job_201010051117_0005
>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>> output/data Out: output Measure:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0
>> t2: 55.0
>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>> job_201010051117_0006
>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>> job_201010051117_0006
>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>> In: output/clusters-0 Out: output Distance:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>> job_201010051117_0007
>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>> job_201010051117_0007
>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>> Vectors: org.apache.mahout.math.VectorWritable
>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>> job_201010051117_0008
>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>> job_201010051117_0008
>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>
>>       was (Author: pgradadia):
>>     i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>> Kmeans clustering error
>>> -----------------------
>>>
>>>                 Key: MAHOUT-504
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>             Project: Mahout
>>>          Issue Type: Bug
>>>            Reporter: Zhen Guo
>>>            Assignee: Robin Anil
>>>             Fix For: 0.4
>>>
>>>
>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>> Mapper. I am using Trunk.
>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by Joe Kumar <jo...@gmail.com>.

Pragnesh,

I got the latest code from repo and did mvn clean and mvn install.
Then I followed the instructions in the wiki link I had mentioned below and
the kmeans clustering task on synthetic control executed just fine.
Please let know if you face issues following steps in the wiki.

regards
Joe.

On Tue, Oct 5, 2010 at 4:34 PM, Joe Kumar <jo...@gmail.com> wrote:

> Hi Pragnesh,
>
> Just wondering if you tried the steps in
> https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
> .
> It was working just fine like 2 weeks ago. I'll probably verify it tonite
> (with the latest code from trunk) and let you know.
>
> regards,
> Joe.
>
>
> On Tue, Oct 5, 2010 at 2:57 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:
>
>>  Hi Pragnesh,
>>
>> I really don't know what to suggest to you. I just did a new Mahout
>> checkout and build, followed by uploading the synthetic_control.data file to
>> a local Hadoop instance. The k-means job ran without incident. On a hunch, I
>> also uploaded the file as testdata (not in directory testdata) and that
>> worked too. I'm baffled why I can't duplicate this and suspect it is a local
>> system issue. What OS are you running?
>>
>> If yours works from Eclipse but not from the command line, I wonder if you
>> have done mvn clean build from the command line before you ran the CLI
>> Mahout job? Eclipse compiles its bits into different directories and does
>> not build the necessary job files. Other than that, I suggest checking your
>> file system groups and permissions.
>>
>> If you find something that gets you running again, *please* post your
>> solution so we can advise others who are experiencing the same error
>> message.
>>
>>
>>
>> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>
>>>     [
>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502]
>>>
>>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>>> ----------------------------------------------------------
>>>
>>> i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>
>>> this run fine from eclipse
>>>
>>> but when i try to run from command line with hadoop. i see following
>>> output.
>>>
>>> while  $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>>> without any error.
>>>
>>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>>> classpath, will use command-line arguments only
>>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>>> job_201010051117_0005
>>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0005
>>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>>> output/data Out: output Measure:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1:
>>> 80.0 t2: 55.0
>>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>>> job_201010051117_0006
>>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0006
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-0 Out: output Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>>> job_201010051117_0007
>>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0007
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>>> Vectors: org.apache.mahout.math.VectorWritable
>>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>>> job_201010051117_0008
>>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0008
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>>
>>>       was (Author: pgradadia):
>>>     i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>  Kmeans clustering error
>>>> -----------------------
>>>>
>>>>                 Key: MAHOUT-504
>>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>             Project: Mahout
>>>>          Issue Type: Bug
>>>>            Reporter: Zhen Guo
>>>>            Assignee: Robin Anil
>>>>             Fix For: 0.4
>>>>
>>>>
>>>> I tried the Kmeans algorithm on the Synthetic Control data. The
>>>> following error appears. I tried the Canopy algorithm, it is fine. This
>>>> error is from Mapper. I am using Trunk.
>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>        at
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>        at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>
>>>
>>
>
>
>
>

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by Joe Kumar <jo...@gmail.com>.

Hi Pragnesh,

Just wondering if you tried the steps in
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
.
It was working just fine like 2 weeks ago. I'll probably verify it tonite
(with the latest code from trunk) and let you know.

regards,
Joe.


On Tue, Oct 5, 2010 at 2:57 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:

>  Hi Pragnesh,
>
> I really don't know what to suggest to you. I just did a new Mahout
> checkout and build, followed by uploading the synthetic_control.data file to
> a local Hadoop instance. The k-means job ran without incident. On a hunch, I
> also uploaded the file as testdata (not in directory testdata) and that
> worked too. I'm baffled why I can't duplicate this and suspect it is a local
> system issue. What OS are you running?
>
> If yours works from Eclipse but not from the command line, I wonder if you
> have done mvn clean build from the command line before you ran the CLI
> Mahout job? Eclipse compiles its bits into different directories and does
> not build the necessary job files. Other than that, I suggest checking your
> file system groups and permissions.
>
> If you find something that gets you running again, *please* post your
> solution so we can advise others who are experiencing the same error
> message.
>
>
>
> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>
>>     [
>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502]
>>
>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>> ----------------------------------------------------------
>>
>> i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>
>> this run fine from eclipse
>>
>> but when i try to run from command line with hadoop. i see following
>> output.
>>
>> while  $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>> without any error.
>>
>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>> classpath, will use command-line arguments only
>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>> job_201010051117_0005
>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>> job_201010051117_0005
>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>> output/data Out: output Measure:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1:
>> 80.0 t2: 55.0
>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>> job_201010051117_0006
>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>> job_201010051117_0006
>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>> In: output/clusters-0 Out: output Distance:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>> job_201010051117_0007
>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>> job_201010051117_0007
>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>> Vectors: org.apache.mahout.math.VectorWritable
>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>> job_201010051117_0008
>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>> job_201010051117_0008
>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>
>>       was (Author: pgradadia):
>>     i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>  Kmeans clustering error
>>> -----------------------
>>>
>>>                 Key: MAHOUT-504
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>             Project: Mahout
>>>          Issue Type: Bug
>>>            Reporter: Zhen Guo
>>>            Assignee: Robin Anil
>>>             Fix For: 0.4
>>>
>>>
>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>> Mapper. I am using Trunk.
>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>
>

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  Hi Pragnesh,

I really don't know what to suggest to you. I just did a new Mahout 
checkout and build, followed by uploading the synthetic_control.data 
file to a local Hadoop instance. The k-means job ran without incident. 
On a hunch, I also uploaded the file as testdata (not in directory 
testdata) and that worked too. I'm baffled why I can't duplicate this 
and suspect it is a local system issue. What OS are you running?

If yours works from Eclipse but not from the command line, I wonder if 
you have done mvn clean build from the command line before you ran the 
CLI Mahout job? Eclipse compiles its bits into different directories and 
does not build the necessary job files. Other than that, I suggest 
checking your file system groups and permissions.

If you find something that gets you running again, *please* post your 
solution so we can advise others who are experiencing the same error 
message.


On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502 ]
>
> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
> ----------------------------------------------------------
>
> i am also getting same exption with trunk code
>
> 10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>
> this run fine from eclipse
>
> but when i try to run from command line with hadoop. i see following output.
>
> while  $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine without any error.
>
> pragnesh-laptop% $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
> 10/10/05 12:26:05 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on classpath, will use command-line arguments only
> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process : 1
> 10/10/05 12:26:09 INFO mapred.JobClient: Running job: job_201010051117_0005
> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete: job_201010051117_0005
> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0 t2: 55.0
> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process : 1
> 10/10/05 12:26:30 INFO mapred.JobClient: Running job: job_201010051117_0006
> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete: job_201010051117_0006
> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-0 Out: output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure
> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process : 1
> 10/10/05 12:26:58 INFO mapred.JobClient: Running job: job_201010051117_0007
> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_1, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_2, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete: job_201010051117_0007
> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-1 Out: output/clusteredPoints Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: org.apache.mahout.math.VectorWritable
> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process : 1
> 10/10/05 12:27:37 INFO mapred.JobClient: Running job: job_201010051117_0008
> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_1, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_2, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete: job_201010051117_0008
> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>
>        was (Author: pgradadia):
>      i am also getting same exption with trunk code
>
> 10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>> Kmeans clustering error
>> -----------------------
>>
>>                  Key: MAHOUT-504
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>              Project: Mahout
>>           Issue Type: Bug
>>             Reporter: Zhen Guo
>>             Assignee: Robin Anil
>>              Fix For: 0.4
>>
>>
>> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

[jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by "pragnesh (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502 ] 

pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
----------------------------------------------------------

i am also getting same exption with trunk code

10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)


this run fine from eclipse 
 
but when i try to run from command line with hadoop. i see following output. 

while  $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine without any error.

pragnesh-laptop% $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
10/10/05 12:26:05 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on classpath, will use command-line arguments only
10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:26:09 INFO mapred.JobClient: Running job: job_201010051117_0005
10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
10/10/05 12:26:28 INFO mapred.JobClient: Job complete: job_201010051117_0005
10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters 
10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0 t2: 55.0
10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:26:30 INFO mapred.JobClient: Running job: job_201010051117_0006
10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
10/10/05 12:26:56 INFO mapred.JobClient: Job complete: job_201010051117_0006
10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters 
10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-0 Out: output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure
10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:26:58 INFO mapred.JobClient: Running job: job_201010051117_0007
10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:27:08 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:14 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_1, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:23 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_2, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:35 INFO mapred.JobClient: Job complete: job_201010051117_0007
10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters 
10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-1 Out: output/clusteredPoints Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: org.apache.mahout.math.VectorWritable
10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:27:37 INFO mapred.JobClient: Running job: job_201010051117_0008
10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:27:47 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_0, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:53 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_1, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:59 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_2, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:28:11 INFO mapred.JobClient: Job complete: job_201010051117_0008
10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters 
10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms

      was (Author: pgradadia):
    i am also getting same exption with trunk code

10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
  
> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914619#action_12914619 ] 

Hudson commented on MAHOUT-504:
-------------------------------

Integrated in Mahout-Quality #322 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/322/])
    MAHOUT-504. Fixed CLI arguments and did other refactoring of synthetic control
example. Tested CLI invocation with explicit arguments which was the source of
the problems cited in this issue. All tests run


> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12919381#action_12919381 ] 

Hudson commented on MAHOUT-504:
-------------------------------

Integrated in Mahout-Quality #382 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/382/])
    MAHOUT-504:
- Added job completion tests to break out of iterations if errors occur
- Fixed canopy cluster mapper initialization problem with _log files on Hadoop
- All synthetic control examples run on Hadoop cluster
- All unit tests run


> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by "Jeff Eastman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916487#action_12916487 ] 

Jeff Eastman commented on MAHOUT-504:
-------------------------------------

Trunk is, afaict, working for all synthetic control jobs; both with the default arguments and with user-supplied arguments. There was a problem in 0.3 and some of these issues relate to that edition. This issue should be closed. Does anybody disagree? Zhen?

> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  On 10/4/10 4:12 PM, Jeff Eastman wrote:
>  On 10/4/10 4:06 PM, Jeff Eastman wrote:
>>  Well, whew, transient horkage. After the last commit, I rebuilt and 
>> now it can be run again from the CLI. But it works on my system. Can 
>> somebody try it on theirs? Just run "$MAHOUT_HOME/bin/mahout 
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job" and tell me 
>> I'm not crazy :)
>>
> Oh, yeah, not quite so simple. Need to download the 
> synthetic_control.data file from 
> http://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series 
> and put it in a directory named "testdata". Otherwise you will get an 
> entirely different error.
*and* be sure you download the .data file and not one of the others 
(*.html, *.jpeg) by mistake:
[ ]synthetic_control.data 
<http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data>14-Jun-1999 
13:41 282K

Re: [jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  On 10/4/10 4:06 PM, Jeff Eastman wrote:
>  Well, whew, transient horkage. After the last commit, I rebuilt and 
> now it can be run again from the CLI. But it works on my system. Can 
> somebody try it on theirs? Just run "$MAHOUT_HOME/bin/mahout 
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job" and tell me 
> I'm not crazy :)
>
Oh, yeah, not quite so simple. Need to download the 
synthetic_control.data file from 
http://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series 
and put it in a directory named "testdata". Otherwise you will get an 
entirely different error.

Re: [jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  Well, whew, transient horkage. After the last commit, I rebuilt and 
now it can be run again from the CLI. But it works on my system. Can 
somebody try it on theirs? Just run "$MAHOUT_HOME/bin/mahout 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job" and tell me 
I'm not crazy :)


On 10/4/10 3:47 PM, Jeff Eastman wrote:
>  In the very latest trunk, it is not even possible to run 
> "$MAHOUT_HOME/bin/mahout 
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job". Something 
> in a recent commit seems to have disabled this command form. I did a 
> clean build before running it too. Does anybody have an explanation?
>
> On the error itself, there is no way I can imagine to cause 0 clusters 
> to come out of the Canopy step, unless there is a file-system 
> permission problem preventing it from writing them. With the 
> no-argument job, all the arguments are predefined and Canopy will 
> produce 6 clusters every time: from Eclipse, from CLI and from CLI on 
> Hadoop.
>
> I ran it Saturday in response to a similar posting. Today I can only 
> run it from Eclipse but it worked perfectly. Here's the current CLI 
> output:
>
> jeff-eastmans-macbook-pro:mahout jeff$ $MAHOUT_HOME/bin/mahout 
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
> no HADOOP_HOME set, running locally
> Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: Unable to add class: 
> org.apache.mahout.classifier.sgd.RunLogistic
> java.lang.ClassNotFoundException: 
> org.apache.mahout.classifier.sgd.RunLogistic
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:169)
>     at 
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
> Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: Unable to add class: 
> org.apache.mahout.text.WikipediaToSequenceFile
> java.lang.ClassNotFoundException: 
> org.apache.mahout.text.WikipediaToSequenceFile
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:169)
>     at 
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
> Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: Unable to add class: 
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver
> java.lang.ClassNotFoundException: 
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:169)
>     at 
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
> Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: Unable to add class: 
> org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups
> java.lang.ClassNotFoundException: 
> org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:169)
>     at 
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
> Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: Unable to add class: 
> org.apache.mahout.classifier.bayes.WikipediaXmlSplitter
> java.lang.ClassNotFoundException: 
> org.apache.mahout.classifier.bayes.WikipediaXmlSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:169)
>     at 
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
> Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: Unable to add class: 
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles
> java.lang.ClassNotFoundException: 
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:169)
>     at 
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
> Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: Unable to add class: 
> org.apache.mahout.classifier.sgd.PrintResourceOrFile
> java.lang.ClassNotFoundException: 
> org.apache.mahout.classifier.sgd.PrintResourceOrFile
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:169)
>     at 
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
> Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: Unable to add class: 
> org.apache.mahout.classifier.sgd.TrainLogistic
> java.lang.ClassNotFoundException: 
> org.apache.mahout.classifier.sgd.TrainLogistic
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:169)
>     at 
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
> Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: Unable to add class: 
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
> java.lang.ClassNotFoundException: 
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:169)
>     at 
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:115)
> Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: No 
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found 
> on classpath, will use command-line arguments only
> Unknown program 
> 'org.apache.mahout.clustering.syntheticcontrol.kmeans.Job' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   canopy: : Canopy clustering
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   dirichlet: : Dirichlet Clustering
>   fkmeans: : Fuzzy K-means clustering
>   fpg: : Frequent Pattern Growth
>   itemsimilarity: : Compute the item-item-similarities for item-based 
> collaborative filtering
>   kmeans: : K-means clustering
>   lda: : Latent Dirchlet Allocation
>   ldatopics: : LDA Print Topics
>   lucene.vector: : Generate Vectors from a Lucene index
>   matrixmult: : Take the produc of two matrices
>   meanshift: : Mean Shift clustering
>   rowid: : Map SequenceFile<Text,VectorWritable> to 
> {SequenceFile<IntWritable,VectorWritable>, 
> SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a 
> matrix
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   svd: : Lanczos Singular Value Decomposition
>   testclassifier: : Test Bayes Classifier
>   trainclassifier: : Train Bayes Classifier
>   transpose: : Take the transpose of a matrix
>   vectordump: : Dump vectors from a sequence file to text
>
>
>
> On 10/4/10 12:25 AM, pragnesh (JIRA) wrote:
>>      [ 
>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502 
>> ]
>>
>> pragnesh commented on MAHOUT-504:
>> ---------------------------------
>>
>> i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job: 
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id : 
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>     at 
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>> Kmeans clustering error
>>> -----------------------
>>>
>>>                  Key: MAHOUT-504
>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>              Project: Mahout
>>>           Issue Type: Bug
>>>             Reporter: Zhen Guo
>>>             Assignee: Robin Anil
>>>              Fix For: 0.4
>>>
>>>
>>> I tried the Kmeans algorithm on the Synthetic Control data. The 
>>> following error appears. I tried the Canopy algorithm, it is fine. 
>>> This error is from Mapper. I am using Trunk.
>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : 
>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>

Re: [jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  In the very latest trunk, it is not even possible to run 
"$MAHOUT_HOME/bin/mahout 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job". Something in 
a recent commit seems to have disabled this command form. I did a clean 
build before running it too. Does anybody have an explanation?

On the error itself, there is no way I can imagine to cause 0 clusters 
to come out of the Canopy step, unless there is a file-system permission 
problem preventing it from writing them. With the no-argument job, all 
the arguments are predefined and Canopy will produce 6 clusters every 
time: from Eclipse, from CLI and from CLI on Hadoop.

I ran it Saturday in response to a similar posting. Today I can only run 
it from Eclipse but it worked perfectly. Here's the current CLI output:

jeff-eastmans-macbook-pro:mahout jeff$ $MAHOUT_HOME/bin/mahout 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
no HADOOP_HOME set, running locally
Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: Unable to add class: org.apache.mahout.classifier.sgd.RunLogistic
java.lang.ClassNotFoundException: 
org.apache.mahout.classifier.sgd.RunLogistic
     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:169)
     at 
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: Unable to add class: org.apache.mahout.text.WikipediaToSequenceFile
java.lang.ClassNotFoundException: 
org.apache.mahout.text.WikipediaToSequenceFile
     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:169)
     at 
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: Unable to add class: 
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver
java.lang.ClassNotFoundException: 
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver
     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:169)
     at 
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: Unable to add class: 
org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups
java.lang.ClassNotFoundException: 
org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups
     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:169)
     at 
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: Unable to add class: 
org.apache.mahout.classifier.bayes.WikipediaXmlSplitter
java.lang.ClassNotFoundException: 
org.apache.mahout.classifier.bayes.WikipediaXmlSplitter
     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:169)
     at 
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: Unable to add class: 
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles
java.lang.ClassNotFoundException: 
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles
     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:169)
     at 
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: Unable to add class: 
org.apache.mahout.classifier.sgd.PrintResourceOrFile
java.lang.ClassNotFoundException: 
org.apache.mahout.classifier.sgd.PrintResourceOrFile
     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:169)
     at 
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: Unable to add class: org.apache.mahout.classifier.sgd.TrainLogistic
java.lang.ClassNotFoundException: 
org.apache.mahout.classifier.sgd.TrainLogistic
     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:169)
     at 
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:108)
Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: Unable to add class: 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:169)
     at 
org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:198)
     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:115)
Oct 4, 2010 3:33:54 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: No 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on 
classpath, will use command-line arguments only
Unknown program 
'org.apache.mahout.clustering.syntheticcontrol.kmeans.Job' chosen.
Valid program names are:
   arff.vector: : Generate Vectors from an ARFF file or directory
   canopy: : Canopy clustering
   cleansvd: : Cleanup and verification of SVD output
   clusterdump: : Dump cluster output to text
   dirichlet: : Dirichlet Clustering
   fkmeans: : Fuzzy K-means clustering
   fpg: : Frequent Pattern Growth
   itemsimilarity: : Compute the item-item-similarities for item-based 
collaborative filtering
   kmeans: : K-means clustering
   lda: : Latent Dirchlet Allocation
   ldatopics: : LDA Print Topics
   lucene.vector: : Generate Vectors from a Lucene index
   matrixmult: : Take the produc of two matrices
   meanshift: : Mean Shift clustering
   rowid: : Map SequenceFile<Text,VectorWritable> to 
{SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
   rowsimilarity: : Compute the pairwise similarities of the rows of a 
matrix
   seqdirectory: : Generate sequence files (of Text) from a directory
   seqdumper: : Generic Sequence File dumper
   svd: : Lanczos Singular Value Decomposition
   testclassifier: : Test Bayes Classifier
   trainclassifier: : Train Bayes Classifier
   transpose: : Take the transpose of a matrix
   vectordump: : Dump vectors from a sequence file to text



On 10/4/10 12:25 AM, pragnesh (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502 ]
>
> pragnesh commented on MAHOUT-504:
> ---------------------------------
>
> i am also getting same exption with trunk code
>
> 10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>> Kmeans clustering error
>> -----------------------
>>
>>                  Key: MAHOUT-504
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>              Project: Mahout
>>           Issue Type: Bug
>>             Reporter: Zhen Guo
>>             Assignee: Robin Anil
>>              Fix For: 0.4
>>
>>
>> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

[jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by "pragnesh (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502 ] 

pragnesh commented on MAHOUT-504:
---------------------------------

i am also getting same exption with trunk code

10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Reopened: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  This cannot be running on the latest trunk. The job no longer has a -c 
argument and the initial clusters are always computed by running Canopy 
on the converted data. It is meant to be run with no arguments; default 
values are provided (EuclideanDM, 80, 55) that work consistently. The 
only variables are the distance measure, t1 and t2 values for Canopy. If 
these are changed there will be somewhere between 1 and 600 clusters 
generated by Canopy and k-Means processes them fine.

Predictably, when I run with t1=800 and t2=550 I get a single cluster 
out; with t1=8 and t2=5.5 I get 600. There is no way I can imagine to 
ever get 0 clusters out of Canopy.

I think this has been fixed, but show me a command line that can 
generate this error and I will have something to work with.

On 9/25/10 3:57 AM, Sean Owen (JIRA) wrote:
>       [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Sean Owen reopened MAHOUT-504:
> ------------------------------
>
>
>> Kmeans clustering error
>> -----------------------
>>
>>                  Key: MAHOUT-504
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>              Project: Mahout
>>           Issue Type: Bug
>>             Reporter: Zhen Guo
>>             Assignee: Robin Anil
>>              Fix For: 0.4
>>
>>
>> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

[jira] Reopened: (MAHOUT-504) Kmeans clustering error

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen reopened MAHOUT-504:
------------------------------


> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Updated: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  My bad. The example should not even have a -c parameter as it uses 
Canopy to populate the initial clusters and they go into a default 
directory. I will fix asap.

On 9/24/10 10:05 AM, Jeff Eastman wrote:
>  This error was likely caused by incorrect -c parameter. The error 
> message was misleading. I committed a better message earlier this 
> week. Synthetic control works reliably with k-Means when the arguments 
> are given correctly. I think this can be closed.
>
> On 9/24/10 8:34 AM, Sean Owen (JIRA) wrote:
>>       [ 
>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
>> ]
>>
>> Sean Owen updated MAHOUT-504:
>> -----------------------------
>>
>>           Assignee: Robin Anil
>>      Fix Version/s: 0.5
>>
>>> Kmeans clustering error
>>> -----------------------
>>>
>>>                  Key: MAHOUT-504
>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>              Project: Mahout
>>>           Issue Type: Bug
>>>             Reporter: Zhen Guo
>>>             Assignee: Robin Anil
>>>              Fix For: 0.5
>>>
>>>
>>> I tried the Kmeans algorithm on the Synthetic Control data. The 
>>> following error appears. I tried the Canopy algorithm, it is fine. 
>>> This error is from Mapper. I am using Trunk.
>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : 
>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>     at 
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>

Re: [jira] Updated: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  This error was likely caused by incorrect -c parameter. The error 
message was misleading. I committed a better message earlier this week. 
Synthetic control works reliably with k-Means when the arguments are 
given correctly. I think this can be closed.

On 9/24/10 8:34 AM, Sean Owen (JIRA) wrote:
>       [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Sean Owen updated MAHOUT-504:
> -----------------------------
>
>           Assignee: Robin Anil
>      Fix Version/s: 0.5
>
>> Kmeans clustering error
>> -----------------------
>>
>>                  Key: MAHOUT-504
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>              Project: Mahout
>>           Issue Type: Bug
>>             Reporter: Zhen Guo
>>             Assignee: Robin Anil
>>              Fix For: 0.5
>>
>>
>> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

[jira] Updated: (MAHOUT-504) Kmeans clustering error

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-504:
-----------------------------

         Assignee: Robin Anil
    Fix Version/s: 0.5

> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.5
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAHOUT-504) Kmeans clustering error

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-504.
------------------------------

    Resolution: Fixed

> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  Well, it's the same problem but with an improved error message. Can 
you please post your exact command line invocation?

On 9/24/10 9:58 PM, Zhen Guo (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914737#action_12914737 ]
>
> Zhen Guo commented on MAHOUT-504:
> ---------------------------------
>
> Still failed for different reason.
>
> 10/09/25 01:29:11 INFO mapred.JobClient: Task Id : attempt_201008261432_1574_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>> Kmeans clustering error
>> -----------------------
>>
>>                  Key: MAHOUT-504
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>              Project: Mahout
>>           Issue Type: Bug
>>             Reporter: Zhen Guo
>>             Assignee: Robin Anil
>>              Fix For: 0.4
>>
>>
>> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

[jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by "Zhen Guo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914737#action_12914737 ] 

Zhen Guo commented on MAHOUT-504:
---------------------------------

Still failed for different reason.

10/09/25 01:29:11 INFO mapred.JobClient: Task Id : attempt_201008261432_1574_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  On 9/30/10 5:02 PM, Zhen Guo (JIRA) wrote:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
Yup, been running it standalone and on a 1-node cluster every day for 
the last several days. I cannot reproduce your problem. You might want 
to check your local filesystem permissions. Others have evidently gotten 
crosswise by that.

[jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by "Zhen Guo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916657#action_12916657 ] 

Zhen Guo commented on MAHOUT-504:
---------------------------------

Jeff, did you run the following command recently?

$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

I use the Trunk code on Sept. 27. It does not work for me. The following error message:

10/09/30 20:58:07 INFO mapred.JobClient: Task Id : attempt_201008261432_2003_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)


> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

  Yes, you must use trunk. I've tested the current build on stand-alone 
and a 1-node cluster and both work correctly. IIRC, 0.3 had problems 
with synthetic control but that was long ago. Mahout is changing so fast 
that we always recommend using trunk.

On 9/27/10 2:06 PM, Zhen Guo (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915400#action_12915400 ]
>
> Zhen Guo commented on MAHOUT-504:
> ---------------------------------
>
> Is this change available in Trunk?
>
> I tested as in Quick Start document. I use the following command:
>
> $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>
> It failed and the error messages are the same as above.
>
>> Kmeans clustering error
>> -----------------------
>>
>>                  Key: MAHOUT-504
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>              Project: Mahout
>>           Issue Type: Bug
>>             Reporter: Zhen Guo
>>             Assignee: Robin Anil
>>              Fix For: 0.4
>>
>>
>> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

[jira] Commented: (MAHOUT-504) Kmeans clustering error

Posted by "Zhen Guo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915400#action_12915400 ] 

Zhen Guo commented on MAHOUT-504:
---------------------------------

Is this change available in Trunk?

I tested as in Quick Start document. I use the following command:

$MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

It failed and the error messages are the same as above.

> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.