You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "pragnesh (JIRA)" <ji...@apache.org> on 2010/10/05 09:06:32 UTC

[jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502 ] 

pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
----------------------------------------------------------

i am also getting same exption with trunk code

10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)


this run fine from eclipse 
 
but when i try to run from command line with hadoop. i see following output. 

while  $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine without any error.

pragnesh-laptop% $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
10/10/05 12:26:05 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on classpath, will use command-line arguments only
10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:26:09 INFO mapred.JobClient: Running job: job_201010051117_0005
10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
10/10/05 12:26:28 INFO mapred.JobClient: Job complete: job_201010051117_0005
10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters 
10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0 t2: 55.0
10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:26:30 INFO mapred.JobClient: Running job: job_201010051117_0006
10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
10/10/05 12:26:56 INFO mapred.JobClient: Job complete: job_201010051117_0006
10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters 
10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-0 Out: output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure
10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:26:58 INFO mapred.JobClient: Running job: job_201010051117_0007
10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:27:08 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:14 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_1, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:23 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_2, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:35 INFO mapred.JobClient: Job complete: job_201010051117_0007
10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters 
10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-1 Out: output/clusteredPoints Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: org.apache.mahout.math.VectorWritable
10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process : 1
10/10/05 12:27:37 INFO mapred.JobClient: Running job: job_201010051117_0008
10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
10/10/05 12:27:47 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_0, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:53 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_1, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:27:59 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_2, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/10/05 12:28:11 INFO mapred.JobClient: Job complete: job_201010051117_0008
10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters 
10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms

      was (Author: pgradadia):
    i am also getting same exption with trunk code

10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
  
> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
  +user@

Thanks so much Pragnesh, for putting your finger so succinctly on the 
problem. I'm cross-posting this to user@ so that it will be part of that 
searchable archive too. I will also append to MAHOUT-504.

I'm glad to hear you are out of the woods on this,
Jeff


On 10/8/10 2:02 AM, pragnesh radadia wrote:
> finally I am able to run kmean example of Clustering of synthetic control data.
>
> I think problem is "hadoop is running as hadoop user(using cloudera
> cdh3) and I am trying to run example as pragnesh user"
>
> so hadoop is not able find the under "/user/hadoop"
>
> since example is using relative path to store the input and clustering data.
>
> -pragnesh
>
>
> On Wed, Oct 6, 2010 at 12:27 AM, Jeff Eastman
> <jd...@windwardsolutions.com>  wrote:
>>   Hi Pragnesh,
>>
>> I really don't know what to suggest to you. I just did a new Mahout checkout
>> and build, followed by uploading the synthetic_control.data file to a local
>> Hadoop instance. The k-means job ran without incident. On a hunch, I also
>> uploaded the file as testdata (not in directory testdata) and that worked
>> too. I'm baffled why I can't duplicate this and suspect it is a local system
>> issue. What OS are you running?
>>
>> If yours works from Eclipse but not from the command line, I wonder if you
>> have done mvn clean build from the command line before you ran the CLI
>> Mahout job? Eclipse compiles its bits int
>
> o different directories and does
>> not build the necessary job files. Other than that, I suggest checking your
>> file system groups and permissions.
>>
>> If you find something that gets you running again, *please* post your
>> solution so we can advise others who are experiencing the same error
>> message.
>>
>>
>> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>>      [
>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502
>>> ]
>>>
>>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>>> ----------------------------------------------------------
>>>
>>> i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>
>>> this run fine from eclipse
>>>
>>> but when i try to run from command line with hadoop. i see following
>>> output.
>>>
>>> while  $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>>> without any error.
>>>
>>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>>> classpath, will use command-line arguments only
>>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>>> job_201010051117_0005
>>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0005
>>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>>> output/data Out: output Measure:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0
>>> t2: 55.0
>>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>>> job_201010051117_0006
>>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0006
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-0 Out: output Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>>> job_201010051117_0007
>>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0007
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>>> Vectors: org.apache.mahout.math.VectorWritable
>>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>>> job_201010051117_0008
>>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0008
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>>
>>>        was (Author: pgradadia):
>>>      i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>> Kmeans clustering error
>>>> -----------------------
>>>>
>>>>                  Key: MAHOUT-504
>>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>              Project: Mahout
>>>>           Issue Type: Bug
>>>>             Reporter: Zhen Guo
>>>>             Assignee: Robin Anil
>>>>              Fix For: 0.4
>>>>
>>>>
>>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>>> Mapper. I am using Trunk.
>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>         at
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>


Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
  +user@

Thanks so much Pragnesh, for putting your finger so succinctly on the 
problem. I'm cross-posting this to user@ so that it will be part of that 
searchable archive too. I will also append to MAHOUT-504.

I'm glad to hear you are out of the woods on this,
Jeff


On 10/8/10 2:02 AM, pragnesh radadia wrote:
> finally I am able to run kmean example of Clustering of synthetic control data.
>
> I think problem is "hadoop is running as hadoop user(using cloudera
> cdh3) and I am trying to run example as pragnesh user"
>
> so hadoop is not able find the under "/user/hadoop"
>
> since example is using relative path to store the input and clustering data.
>
> -pragnesh
>
>
> On Wed, Oct 6, 2010 at 12:27 AM, Jeff Eastman
> <jd...@windwardsolutions.com>  wrote:
>>   Hi Pragnesh,
>>
>> I really don't know what to suggest to you. I just did a new Mahout checkout
>> and build, followed by uploading the synthetic_control.data file to a local
>> Hadoop instance. The k-means job ran without incident. On a hunch, I also
>> uploaded the file as testdata (not in directory testdata) and that worked
>> too. I'm baffled why I can't duplicate this and suspect it is a local system
>> issue. What OS are you running?
>>
>> If yours works from Eclipse but not from the command line, I wonder if you
>> have done mvn clean build from the command line before you ran the CLI
>> Mahout job? Eclipse compiles its bits int
>
> o different directories and does
>> not build the necessary job files. Other than that, I suggest checking your
>> file system groups and permissions.
>>
>> If you find something that gets you running again, *please* post your
>> solution so we can advise others who are experiencing the same error
>> message.
>>
>>
>> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>>      [
>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502
>>> ]
>>>
>>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>>> ----------------------------------------------------------
>>>
>>> i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>
>>> this run fine from eclipse
>>>
>>> but when i try to run from command line with hadoop. i see following
>>> output.
>>>
>>> while  $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>>> without any error.
>>>
>>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>>> classpath, will use command-line arguments only
>>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>>> job_201010051117_0005
>>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0005
>>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>>> output/data Out: output Measure:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0
>>> t2: 55.0
>>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>>> job_201010051117_0006
>>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0006
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-0 Out: output Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>>> job_201010051117_0007
>>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0007
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>>> Vectors: org.apache.mahout.math.VectorWritable
>>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>>> job_201010051117_0008
>>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0008
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>>
>>>        was (Author: pgradadia):
>>>      i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>         at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>> Kmeans clustering error
>>>> -----------------------
>>>>
>>>>                  Key: MAHOUT-504
>>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>              Project: Mahout
>>>>           Issue Type: Bug
>>>>             Reporter: Zhen Guo
>>>>             Assignee: Robin Anil
>>>>              Fix For: 0.4
>>>>
>>>>
>>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>>> Mapper. I am using Trunk.
>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>         at
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>


Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by pragnesh radadia <pr...@gmail.com>.
finally I am able to run kmean example of Clustering of synthetic control data.

I think problem is "hadoop is running as hadoop user(using cloudera
cdh3) and I am trying to run example as pragnesh user"

so hadoop is not able find the under "/user/hadoop"

since example is using relative path to store the input and clustering data.

-pragnesh


On Wed, Oct 6, 2010 at 12:27 AM, Jeff Eastman
<jd...@windwardsolutions.com> wrote:
>  Hi Pragnesh,
>
> I really don't know what to suggest to you. I just did a new Mahout checkout
> and build, followed by uploading the synthetic_control.data file to a local
> Hadoop instance. The k-means job ran without incident. On a hunch, I also
> uploaded the file as testdata (not in directory testdata) and that worked
> too. I'm baffled why I can't duplicate this and suspect it is a local system
> issue. What OS are you running?
>
> If yours works from Eclipse but not from the command line, I wonder if you
> have done mvn clean build from the command line before you ran the CLI
> Mahout job? Eclipse compiles its bits int


o different directories and does
> not build the necessary job files. Other than that, I suggest checking your
> file system groups and permissions.
>
> If you find something that gets you running again, *please* post your
> solution so we can advise others who are experiencing the same error
> message.
>
>
> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>
>>     [
>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502
>> ]
>>
>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>> ----------------------------------------------------------
>>
>> i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>
>> this run fine from eclipse
>>
>> but when i try to run from command line with hadoop. i see following
>> output.
>>
>> while  $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>> without any error.
>>
>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>> classpath, will use command-line arguments only
>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>> job_201010051117_0005
>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>> job_201010051117_0005
>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>> output/data Out: output Measure:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0
>> t2: 55.0
>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>> job_201010051117_0006
>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>> job_201010051117_0006
>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>> In: output/clusters-0 Out: output Distance:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>> job_201010051117_0007
>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>> job_201010051117_0007
>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>> Vectors: org.apache.mahout.math.VectorWritable
>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>> job_201010051117_0008
>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>> job_201010051117_0008
>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>
>>       was (Author: pgradadia):
>>     i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>> Kmeans clustering error
>>> -----------------------
>>>
>>>                 Key: MAHOUT-504
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>             Project: Mahout
>>>          Issue Type: Bug
>>>            Reporter: Zhen Guo
>>>            Assignee: Robin Anil
>>>             Fix For: 0.4
>>>
>>>
>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>> Mapper. I am using Trunk.
>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by Joe Kumar <jo...@gmail.com>.
Pragnesh,

I got the latest code from repo and did mvn clean and mvn install.
Then I followed the instructions in the wiki link I had mentioned below and
the kmeans clustering task on synthetic control executed just fine.
Please let know if you face issues following steps in the wiki.

regards
Joe.

On Tue, Oct 5, 2010 at 4:34 PM, Joe Kumar <jo...@gmail.com> wrote:

> Hi Pragnesh,
>
> Just wondering if you tried the steps in
> https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
> .
> It was working just fine like 2 weeks ago. I'll probably verify it tonite
> (with the latest code from trunk) and let you know.
>
> regards,
> Joe.
>
>
> On Tue, Oct 5, 2010 at 2:57 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:
>
>>  Hi Pragnesh,
>>
>> I really don't know what to suggest to you. I just did a new Mahout
>> checkout and build, followed by uploading the synthetic_control.data file to
>> a local Hadoop instance. The k-means job ran without incident. On a hunch, I
>> also uploaded the file as testdata (not in directory testdata) and that
>> worked too. I'm baffled why I can't duplicate this and suspect it is a local
>> system issue. What OS are you running?
>>
>> If yours works from Eclipse but not from the command line, I wonder if you
>> have done mvn clean build from the command line before you ran the CLI
>> Mahout job? Eclipse compiles its bits into different directories and does
>> not build the necessary job files. Other than that, I suggest checking your
>> file system groups and permissions.
>>
>> If you find something that gets you running again, *please* post your
>> solution so we can advise others who are experiencing the same error
>> message.
>>
>>
>>
>> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>
>>>     [
>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502]
>>>
>>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>>> ----------------------------------------------------------
>>>
>>> i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>
>>> this run fine from eclipse
>>>
>>> but when i try to run from command line with hadoop. i see following
>>> output.
>>>
>>> while  $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>>> without any error.
>>>
>>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>>> classpath, will use command-line arguments only
>>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>>> job_201010051117_0005
>>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0005
>>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>>> output/data Out: output Measure:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1:
>>> 80.0 t2: 55.0
>>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>>> job_201010051117_0006
>>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0006
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-0 Out: output Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>>> job_201010051117_0007
>>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0007
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>>> Vectors: org.apache.mahout.math.VectorWritable
>>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>>> job_201010051117_0008
>>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0008
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>>
>>>       was (Author: pgradadia):
>>>     i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>  Kmeans clustering error
>>>> -----------------------
>>>>
>>>>                 Key: MAHOUT-504
>>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>             Project: Mahout
>>>>          Issue Type: Bug
>>>>            Reporter: Zhen Guo
>>>>            Assignee: Robin Anil
>>>>             Fix For: 0.4
>>>>
>>>>
>>>> I tried the Kmeans algorithm on the Synthetic Control data. The
>>>> following error appears. I tried the Canopy algorithm, it is fine. This
>>>> error is from Mapper. I am using Trunk.
>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>        at
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>        at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>
>>>
>>
>
>
>
>

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by Joe Kumar <jo...@gmail.com>.
Hi Pragnesh,

Just wondering if you tried the steps in
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
.
It was working just fine like 2 weeks ago. I'll probably verify it tonite
(with the latest code from trunk) and let you know.

regards,
Joe.


On Tue, Oct 5, 2010 at 2:57 PM, Jeff Eastman <jd...@windwardsolutions.com>wrote:

>  Hi Pragnesh,
>
> I really don't know what to suggest to you. I just did a new Mahout
> checkout and build, followed by uploading the synthetic_control.data file to
> a local Hadoop instance. The k-means job ran without incident. On a hunch, I
> also uploaded the file as testdata (not in directory testdata) and that
> worked too. I'm baffled why I can't duplicate this and suspect it is a local
> system issue. What OS are you running?
>
> If yours works from Eclipse but not from the command line, I wonder if you
> have done mvn clean build from the command line before you ran the CLI
> Mahout job? Eclipse compiles its bits into different directories and does
> not build the necessary job files. Other than that, I suggest checking your
> file system groups and permissions.
>
> If you find something that gets you running again, *please* post your
> solution so we can advise others who are experiencing the same error
> message.
>
>
>
> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>
>>     [
>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502]
>>
>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>> ----------------------------------------------------------
>>
>> i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>
>> this run fine from eclipse
>>
>> but when i try to run from command line with hadoop. i see following
>> output.
>>
>> while  $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>> without any error.
>>
>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>> classpath, will use command-line arguments only
>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>> job_201010051117_0005
>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>> job_201010051117_0005
>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>> output/data Out: output Measure:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1:
>> 80.0 t2: 55.0
>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>> job_201010051117_0006
>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>> job_201010051117_0006
>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>> In: output/clusters-0 Out: output Distance:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>> job_201010051117_0007
>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>> job_201010051117_0007
>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>> org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>> Vectors: org.apache.mahout.math.VectorWritable
>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>> job_201010051117_0008
>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>> job_201010051117_0008
>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>
>>       was (Author: pgradadia):
>>     i am also getting same exption with trunk code
>>
>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>> job_201010041038_0019
>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>        at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>>  Kmeans clustering error
>>> -----------------------
>>>
>>>                 Key: MAHOUT-504
>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>             Project: Mahout
>>>          Issue Type: Bug
>>>            Reporter: Zhen Guo
>>>            Assignee: Robin Anil
>>>             Fix For: 0.4
>>>
>>>
>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>> Mapper. I am using Trunk.
>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>
>

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
  Hi Pragnesh,

I really don't know what to suggest to you. I just did a new Mahout 
checkout and build, followed by uploading the synthetic_control.data 
file to a local Hadoop instance. The k-means job ran without incident. 
On a hunch, I also uploaded the file as testdata (not in directory 
testdata) and that worked too. I'm baffled why I can't duplicate this 
and suspect it is a local system issue. What OS are you running?

If yours works from Eclipse but not from the command line, I wonder if 
you have done mvn clean build from the command line before you ran the 
CLI Mahout job? Eclipse compiles its bits into different directories and 
does not build the necessary job files. Other than that, I suggest 
checking your file system groups and permissions.

If you find something that gets you running again, *please* post your 
solution so we can advise others who are experiencing the same error 
message.


On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502 ]
>
> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
> ----------------------------------------------------------
>
> i am also getting same exption with trunk code
>
> 10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>
> this run fine from eclipse
>
> but when i try to run from command line with hadoop. i see following output.
>
> while  $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine without any error.
>
> pragnesh-laptop% $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
> 10/10/05 12:26:05 WARN driver.MahoutDriver: No org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on classpath, will use command-line arguments only
> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process : 1
> 10/10/05 12:26:09 INFO mapred.JobClient: Running job: job_201010051117_0005
> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete: job_201010051117_0005
> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input: output/data Out: output Measure: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c t1: 80.0 t2: 55.0
> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process : 1
> 10/10/05 12:26:30 INFO mapred.JobClient: Running job: job_201010051117_0006
> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete: job_201010051117_0006
> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-0 Out: output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure
> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process : 1
> 10/10/05 12:26:58 INFO mapred.JobClient: Running job: job_201010051117_0007
> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_1, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id : attempt_201010051117_0007_m_000000_2, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete: job_201010051117_0007
> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters In: output/clusters-1 Out: output/clusteredPoints Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure@136a43c
> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input Vectors: org.apache.mahout.math.VectorWritable
> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process : 1
> 10/10/05 12:27:37 INFO mapred.JobClient: Running job: job_201010051117_0008
> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_1, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id : attempt_201010051117_0008_m_000000_2, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete: job_201010051117_0008
> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>
>        was (Author: pgradadia):
>      i am also getting same exption with trunk code
>
> 10/10/04 12:42:34 INFO mapred.JobClient: Running job: job_201010041038_0019
> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id : attempt_201010041038_0019_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> 	at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>> Kmeans clustering error
>> -----------------------
>>
>>                  Key: MAHOUT-504
>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>              Project: Mahout
>>           Issue Type: Bug
>>             Reporter: Zhen Guo
>>             Assignee: Robin Anil
>>              Fix For: 0.4
>>
>>
>> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)