You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "qiang xu (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/02/14 16:48:00 UTC

[jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 ] 

qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
----------------------------------------------------------

This problem still exist in mahout 0.5 and 0.6 
./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10  -ow
Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
It is really weired that cluster is gernerated
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
Found 4 items
drwxr-xr-x   - root supergroup          0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
drwxr-xr-x   - root supergroup          0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
drwxr-xr-x   - root supergroup          0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
drwxr-xr-x   - root supergroup          0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
Found 1 items
-rw-r--r--   2 root supergroup        139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed

I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
                
      was (Author: skaterxu):
    ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10  -ow
Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
It is really weired that cluster is gernerated
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
Found 4 items
drwxr-xr-x   - root supergroup          0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
drwxr-xr-x   - root supergroup          0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
drwxr-xr-x   - root supergroup          0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
drwxr-xr-x   - root supergroup          0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
Found 1 items
-rw-r--r--   2 root supergroup        139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed

I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
                  
> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> 	at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

As I explain in a the above post, the reason for this is historical. I 
agree it should be improved.

On 2/15/12 8:46 PM, Lance Norskog wrote:
> Nobody reads the docs. If the program itself can do this, instead of
> just barfing, it should. This is a case of Passive-Agressive Error
> Reporting.
>
> On Wed, Feb 15, 2012 at 7:20 AM, Jeff Eastman
> <jd...@windwardsolutions.com>  wrote:
>> The error message describes what the algorithm can see: that there are no
>> initial clusters. The wiki documentation seems reasonably clear on the use
>> of -k
>> (https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering) to
>> obtain them by sampling the input dataset, otherwise -c needs to contain
>> clusters produced by the user.
>>
>>
>> On 2/14/12 8:04 PM, Lance Norskog wrote:
>>> Could the error message describe the user's mistake?
>>>
>>> On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman
>>> <jd...@windwardsolutions.com>    wrote:
>>>> +1 bingo. K-Means is expecting you to provide the prior cluster centers
>>>> in
>>>> -c. If you want it to sample from your input data you need to add the -k
>>>> option to tell it how many you want. This has been a constant part of the
>>>> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if
>>>> you overlook this argument.
>>>>
>>>>
>>>>
>>>> On 2/14/12 8:56 AM, Suneel Marthi wrote:
>>>>> You are not specifying the number of clusters that need to be generated,
>>>>> try running again by specifying a -k<number of clusters>      option. You
>>>>> also
>>>>> need to specify that you need clustering to be done with -cl.
>>>>>
>>>>> For example:-
>>>>>
>>>>> ./bin/mahout kmeans -i
>>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
>>>>> 10  -ow -k 20 -cl
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>>   From: qiang xu (Issue Comment Edited) (JIRA)<ji...@apache.org>
>>>>> To: dev@mahout.apache.org
>>>>> Sent: Tuesday, February 14, 2012 10:48 AM
>>>>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
>>>>> error
>>>>>
>>>>>
>>>>>      [
>>>>>
>>>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675
>>>>> ]
>>>>>
>>>>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
>>>>> ----------------------------------------------------------
>>>>>
>>>>> This problem still exist in mahout 0.5 and 0.6
>>>>> ./bin/mahout kmeans -i
>>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>>>>   -ow
>>>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>>>>
>>>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>>>> --endPhase=2147483647,
>>>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>>>> --maxIter=10, --method=mapreduce,
>>>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>>>> --startPhase=0, --tempDir=temp}
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans
>>>>> Distance:
>>>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
>>>>> Input
>>>>> Vectors: {}
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to
>>>>> process
>>>>> : 1
>>>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>>>> job_201202131515_0122
>>>>> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>>>          at
>>>>>
>>>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>>>>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>          at
>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>          at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>> It is really weired that cluster is gernerated
>>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>>> /user/root/examples/bin/work/
>>>>> Found 4 items
>>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
>>>>> /user/root/examples/bin/work/clusters
>>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
>>>>> /user/root/examples/bin/work/reuters-kmeans
>>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
>>>>> /user/root/examples/bin/work/reuters-out-seqdir
>>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
>>>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>>> /user/root/examples/bin/work/clusters
>>>>> Found 1 items
>>>>> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55
>>>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>>>
>>>>> I follow the guide in
>>>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>>>                        was (Author: skaterxu):
>>>>>      ./bin/mahout kmeans -i
>>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>>>>   -ow
>>>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>>>>
>>>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>>>> --endPhase=2147483647,
>>>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>>>> --maxIter=10, --method=mapreduce,
>>>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>>>> --startPhase=0, --tempDir=temp}
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans
>>>>> Distance:
>>>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
>>>>> Input
>>>>> Vectors: {}
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to
>>>>> process
>>>>> : 1
>>>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>>>> job_201202131515_0122
>>>>> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>>>          at
>>>>>
>>>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>>>>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>          at
>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>          at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>> It is really weired that cluster is gernerated
>>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>>> /user/root/examples/bin/work/
>>>>> Found 4 items
>>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
>>>>> /user/root/examples/bin/work/clusters
>>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
>>>>> /user/root/examples/bin/work/reuters-kmeans
>>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
>>>>> /user/root/examples/bin/work/reuters-out-seqdir
>>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
>>>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>>> /user/root/examples/bin/work/clusters
>>>>> Found 1 items
>>>>> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55
>>>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>>>
>>>>> I follow the guide in
>>>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>>>
>>>>>> Kmeans clustering error
>>>>>> -----------------------
>>>>>>
>>>>>>                   Key: MAHOUT-504
>>>>>>                   URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>>>               Project: Mahout
>>>>>>            Issue Type: Bug
>>>>>>              Reporter: Zhen Guo
>>>>>>              Assignee: Robin Anil
>>>>>>               Fix For: 0.4
>>>>>>
>>>>>>
>>>>>> I tried the Kmeans algorithm on the Synthetic Control data. The
>>>>>> following
>>>>>> error appears. I tried the Canopy algorithm, it is fine. This error is
>>>>>> from
>>>>>> Mapper. I am using Trunk.
>>>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>>>      at
>>>>>>
>>>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>>>      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>>      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>>>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>>      at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>> --
>>>>> This message is automatically generated by JIRA.
>>>>> If you think it was sent incorrectly, please contact your JIRA
>>>>> administrators:
>>>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>>>> For more information on JIRA, see:
>>>>> http://www.atlassian.com/software/jira
>>>
>>>
>
>

Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

Posted by Lance Norskog <go...@gmail.com>.

Nobody reads the docs. If the program itself can do this, instead of
just barfing, it should. This is a case of Passive-Agressive Error
Reporting.

On Wed, Feb 15, 2012 at 7:20 AM, Jeff Eastman
<jd...@windwardsolutions.com> wrote:
> The error message describes what the algorithm can see: that there are no
> initial clusters. The wiki documentation seems reasonably clear on the use
> of -k
> (https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering) to
> obtain them by sampling the input dataset, otherwise -c needs to contain
> clusters produced by the user.
>
>
> On 2/14/12 8:04 PM, Lance Norskog wrote:
>>
>> Could the error message describe the user's mistake?
>>
>> On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman
>> <jd...@windwardsolutions.com>  wrote:
>>>
>>> +1 bingo. K-Means is expecting you to provide the prior cluster centers
>>> in
>>> -c. If you want it to sample from your input data you need to add the -k
>>> option to tell it how many you want. This has been a constant part of the
>>> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if
>>> you overlook this argument.
>>>
>>>
>>>
>>> On 2/14/12 8:56 AM, Suneel Marthi wrote:
>>>>
>>>> You are not specifying the number of clusters that need to be generated,
>>>> try running again by specifying a -k<number of clusters>    option. You
>>>> also
>>>> need to specify that you need clustering to be done with -cl.
>>>>
>>>> For example:-
>>>>
>>>> ./bin/mahout kmeans -i
>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
>>>> 10  -ow -k 20 -cl
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>  From: qiang xu (Issue Comment Edited) (JIRA)<ji...@apache.org>
>>>> To: dev@mahout.apache.org
>>>> Sent: Tuesday, February 14, 2012 10:48 AM
>>>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
>>>> error
>>>>
>>>>
>>>>     [
>>>>
>>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675
>>>> ]
>>>>
>>>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
>>>> ----------------------------------------------------------
>>>>
>>>> This problem still exist in mahout 0.5 and 0.6
>>>> ./bin/mahout kmeans -i
>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>>>  -ow
>>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>>>
>>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>>> --endPhase=2147483647,
>>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>>> --maxIter=10, --method=mapreduce,
>>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>>> --startPhase=0, --tempDir=temp}
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans
>>>> Distance:
>>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
>>>> Input
>>>> Vectors: {}
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to
>>>> process
>>>> : 1
>>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>>> job_201202131515_0122
>>>> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>>         at
>>>>
>>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>         at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>> It is really weired that cluster is gernerated
>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>> /user/root/examples/bin/work/
>>>> Found 4 items
>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
>>>> /user/root/examples/bin/work/clusters
>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
>>>> /user/root/examples/bin/work/reuters-kmeans
>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
>>>> /user/root/examples/bin/work/reuters-out-seqdir
>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
>>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>> /user/root/examples/bin/work/clusters
>>>> Found 1 items
>>>> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55
>>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>>
>>>> I follow the guide in
>>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>>                       was (Author: skaterxu):
>>>>     ./bin/mahout kmeans -i
>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>>>  -ow
>>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>>>
>>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>>> --endPhase=2147483647,
>>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>>> --maxIter=10, --method=mapreduce,
>>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>>> --startPhase=0, --tempDir=temp}
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans
>>>> Distance:
>>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
>>>> Input
>>>> Vectors: {}
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to
>>>> process
>>>> : 1
>>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>>> job_201202131515_0122
>>>> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>>         at
>>>>
>>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>         at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>> It is really weired that cluster is gernerated
>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>> /user/root/examples/bin/work/
>>>> Found 4 items
>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
>>>> /user/root/examples/bin/work/clusters
>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
>>>> /user/root/examples/bin/work/reuters-kmeans
>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
>>>> /user/root/examples/bin/work/reuters-out-seqdir
>>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
>>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>> /user/root/examples/bin/work/clusters
>>>> Found 1 items
>>>> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55
>>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>>
>>>> I follow the guide in
>>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>>
>>>>> Kmeans clustering error
>>>>> -----------------------
>>>>>
>>>>>                  Key: MAHOUT-504
>>>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>>              Project: Mahout
>>>>>           Issue Type: Bug
>>>>>             Reporter: Zhen Guo
>>>>>             Assignee: Robin Anil
>>>>>              Fix For: 0.4
>>>>>
>>>>>
>>>>> I tried the Kmeans algorithm on the Synthetic Control data. The
>>>>> following
>>>>> error appears. I tried the Canopy algorithm, it is fine. This error is
>>>>> from
>>>>> Mapper. I am using Trunk.
>>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>>     at
>>>>>
>>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>
>>>> --
>>>> This message is automatically generated by JIRA.
>>>> If you think it was sent incorrectly, please contact your JIRA
>>>> administrators:
>>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>>> For more information on JIRA, see:
>>>> http://www.atlassian.com/software/jira
>>
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

The error message describes what the algorithm can see: that there are 
no initial clusters. The wiki documentation seems reasonably clear on 
the use of -k 
(https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering) 
to obtain them by sampling the input dataset, otherwise -c needs to 
contain clusters produced by the user.

On 2/14/12 8:04 PM, Lance Norskog wrote:
> Could the error message describe the user's mistake?
>
> On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman
> <jd...@windwardsolutions.com>  wrote:
>> +1 bingo. K-Means is expecting you to provide the prior cluster centers in
>> -c. If you want it to sample from your input data you need to add the -k
>> option to tell it how many you want. This has been a constant part of the
>> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if
>> you overlook this argument.
>>
>>
>>
>> On 2/14/12 8:56 AM, Suneel Marthi wrote:
>>> You are not specifying the number of clusters that need to be generated,
>>> try running again by specifying a -k<number of clusters>    option. You also
>>> need to specify that you need clustering to be done with -cl.
>>>
>>> For example:-
>>>
>>> ./bin/mahout kmeans -i
>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
>>> 10  -ow -k 20 -cl
>>>
>>>
>>>
>>> ________________________________
>>>   From: qiang xu (Issue Comment Edited) (JIRA)<ji...@apache.org>
>>> To: dev@mahout.apache.org
>>> Sent: Tuesday, February 14, 2012 10:48 AM
>>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
>>> error
>>>
>>>
>>>      [
>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675
>>> ]
>>>
>>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
>>> ----------------------------------------------------------
>>>
>>> This problem still exist in mahout 0.5 and 0.6
>>> ./bin/mahout kmeans -i
>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>>   -ow
>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>> --endPhase=2147483647,
>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>> --maxIter=10, --method=mapreduce,
>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>> --startPhase=0, --tempDir=temp}
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance:
>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>> job_201202131515_0122
>>> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>          at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>          at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> It is really weired that cluster is gernerated
>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>> /user/root/examples/bin/work/
>>> Found 4 items
>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
>>> /user/root/examples/bin/work/clusters
>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
>>> /user/root/examples/bin/work/reuters-kmeans
>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
>>> /user/root/examples/bin/work/reuters-out-seqdir
>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>> /user/root/examples/bin/work/clusters
>>> Found 1 items
>>> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55
>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>
>>> I follow the guide in
>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>                        was (Author: skaterxu):
>>>      ./bin/mahout kmeans -i
>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>>   -ow
>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>> --endPhase=2147483647,
>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>> --maxIter=10, --method=mapreduce,
>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>> --startPhase=0, --tempDir=temp}
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance:
>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>> job_201202131515_0122
>>> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>          at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>          at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> It is really weired that cluster is gernerated
>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>> /user/root/examples/bin/work/
>>> Found 4 items
>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
>>> /user/root/examples/bin/work/clusters
>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
>>> /user/root/examples/bin/work/reuters-kmeans
>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
>>> /user/root/examples/bin/work/reuters-out-seqdir
>>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>> /user/root/examples/bin/work/clusters
>>> Found 1 items
>>> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55
>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>
>>> I follow the guide in
>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>
>>>> Kmeans clustering error
>>>> -----------------------
>>>>
>>>>                   Key: MAHOUT-504
>>>>                   URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>               Project: Mahout
>>>>            Issue Type: Bug
>>>>              Reporter: Zhen Guo
>>>>              Assignee: Robin Anil
>>>>               Fix For: 0.4
>>>>
>>>>
>>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>>> Mapper. I am using Trunk.
>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>      at
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>      at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> --
>>> This message is automatically generated by JIRA.
>>> If you think it was sent incorrectly, please contact your JIRA
>>> administrators:
>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>

Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

Posted by Lance Norskog <go...@gmail.com>.

Could the error message describe the user's mistake?

On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman
<jd...@windwardsolutions.com> wrote:
> +1 bingo. K-Means is expecting you to provide the prior cluster centers in
> -c. If you want it to sample from your input data you need to add the -k
> option to tell it how many you want. This has been a constant part of the
> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if
> you overlook this argument.
>
>
>
> On 2/14/12 8:56 AM, Suneel Marthi wrote:
>>
>> You are not specifying the number of clusters that need to be generated,
>> try running again by specifying a -k<number of clusters>  option. You also
>> need to specify that you need clustering to be done with -cl.
>>
>> For example:-
>>
>> ./bin/mahout kmeans -i
>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
>> 10  -ow -k 20 -cl
>>
>>
>>
>> ________________________________
>>  From: qiang xu (Issue Comment Edited) (JIRA)<ji...@apache.org>
>> To: dev@mahout.apache.org
>> Sent: Tuesday, February 14, 2012 10:48 AM
>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
>> error
>>
>>
>>     [
>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675
>> ]
>>
>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
>> ----------------------------------------------------------
>>
>> This problem still exist in mahout 0.5 and 0.6
>> ./bin/mahout kmeans -i
>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>  -ow
>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>> --endPhase=2147483647,
>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>> --maxIter=10, --method=mapreduce,
>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>> --startPhase=0, --tempDir=temp}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance:
>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>> job_201202131515_0122
>> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>         at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> It is really weired that cluster is gernerated
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/
>> Found 4 items
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
>> /user/root/examples/bin/work/reuters-kmeans
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
>> /user/root/examples/bin/work/reuters-out-seqdir
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/clusters
>> Found 1 items
>> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters/part-randomSeed
>>
>> I follow the guide in
>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>                       was (Author: skaterxu):
>>     ./bin/mahout kmeans -i
>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>  -ow
>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>> --endPhase=2147483647,
>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>> --maxIter=10, --method=mapreduce,
>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>> --startPhase=0, --tempDir=temp}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance:
>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>> job_201202131515_0122
>> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>         at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> It is really weired that cluster is gernerated
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/
>> Found 4 items
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
>> /user/root/examples/bin/work/reuters-kmeans
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
>> /user/root/examples/bin/work/reuters-out-seqdir
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/clusters
>> Found 1 items
>> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters/part-randomSeed
>>
>> I follow the guide in
>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>
>>> Kmeans clustering error
>>> -----------------------
>>>
>>>                  Key: MAHOUT-504
>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>              Project: Mahout
>>>           Issue Type: Bug
>>>             Reporter: Zhen Guo
>>>             Assignee: Robin Anil
>>>              Fix For: 0.4
>>>
>>>
>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>> Mapper. I am using Trunk.
>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>     at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators:
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira



-- 
Lance Norskog
goksron@gmail.com

Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

+1 bingo. K-Means is expecting you to provide the prior cluster centers 
in -c. If you want it to sample from your input data you need to add the 
-k option to tell it how many you want. This has been a constant part of 
the api for some time, hence 0.4, 0.5 and 0.6 will all give the same 
error if you overlook this argument.


On 2/14/12 8:56 AM, Suneel Marthi wrote:
> You are not specifying the number of clusters that need to be generated, try running again by specifying a -k<number of clusters>  option. You also need to specify that you need clustering to be done with -cl.
>
> For example:-
>
> ./bin/mahout kmeans -i
> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
> 10  -ow -k 20 -cl
>
>
>
> ________________________________
>   From: qiang xu (Issue Comment Edited) (JIRA)<ji...@apache.org>
> To: dev@mahout.apache.org
> Sent: Tuesday, February 14, 2012 10:48 AM
> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error
>
>
>      [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 ]
>
> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
> ----------------------------------------------------------
>
> This problem still exist in mahout 0.5 and 0.6
> ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10  -ow
> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
> 12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
>          at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>          at org.apache.hadoop.mapred.Child.main(Child.java:170)
> It is really weired that cluster is gernerated
> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
> Found 4 items
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
> Found 1 items
> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed
>
> I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>                 
>        was (Author: skaterxu):
>      ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10  -ow
> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
> 12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
>          at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>          at org.apache.hadoop.mapred.Child.main(Child.java:170)
> It is really weired that cluster is gernerated
> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
> Found 4 items
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
> Found 1 items
> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed
>
> I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>
>> Kmeans clustering error
>> -----------------------
>>
>>                   Key: MAHOUT-504
>>                   URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>               Project: Mahout
>>            Issue Type: Bug
>>              Reporter: Zhen Guo
>>              Assignee: Robin Anil
>>               Fix For: 0.4
>>
>>
>> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>>      at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>      at org.apache.hadoop.mapred.Child.main(Child.java:170)
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

Posted by Suneel Marthi <su...@yahoo.com>.

You are not specifying the number of clusters that need to be generated, try running again by specifying a -k <number of clusters> option. You also need to specify that you need clustering to be done with -cl.

For example:-

./bin/mahout kmeans -i 
./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c 
./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 
10  -ow -k 20 -cl



________________________________
 From: qiang xu (Issue Comment Edited) (JIRA) <ji...@apache.org>
To: dev@mahout.apache.org 
Sent: Tuesday, February 14, 2012 10:48 AM
Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error
 

    [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 ] 

qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
----------------------------------------------------------

This problem still exist in mahout 0.5 and 0.6 
./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10  -ow
Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
It is really weired that cluster is gernerated
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
Found 4 items
drwxr-xr-x   - root supergroup          0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
drwxr-xr-x   - root supergroup          0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
drwxr-xr-x   - root supergroup          0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
drwxr-xr-x   - root supergroup          0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
Found 1 items
-rw-r--r--   2 root supergroup        139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed

I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
                
      was (Author: skaterxu):
    ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10  -ow
Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
        at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
It is really weired that cluster is gernerated
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
Found 4 items
drwxr-xr-x   - root supergroup          0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
drwxr-xr-x   - root supergroup          0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
drwxr-xr-x   - root supergroup          0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
drwxr-xr-x   - root supergroup          0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
Found 1 items
-rw-r--r--   2 root supergroup        139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed

I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
                  
> Kmeans clustering error
> -----------------------
>
>                 Key: MAHOUT-504
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Zhen Guo
>            Assignee: Robin Anil
>             Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
>     at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>     at org.apache.hadoop.mapred.Child.main(Child.java:170)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira