You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "qiang xu (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/02/14 16:48:00 UTC
[jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error
[ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 ]
qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
----------------------------------------------------------
This problem still exist in mahout 0.5 and 0.6
./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow
Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
It is really weired that cluster is gernerated
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
Found 4 items
drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
Found 1 items
-rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed
I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
was (Author: skaterxu):
./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow
Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
It is really weired that cluster is gernerated
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
Found 4 items
drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
Found 1 items
-rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed
I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
> Kmeans clustering error
> -----------------------
>
> Key: MAHOUT-504
> URL: https://issues.apache.org/jira/browse/MAHOUT-504
> Project: Mahout
> Issue Type: Bug
> Reporter: Zhen Guo
> Assignee: Robin Anil
> Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
error
Posted by Jeff Eastman <jd...@windwardsolutions.com>.
As I explain in a the above post, the reason for this is historical. I
agree it should be improved.
On 2/15/12 8:46 PM, Lance Norskog wrote:
> Nobody reads the docs. If the program itself can do this, instead of
> just barfing, it should. This is a case of Passive-Agressive Error
> Reporting.
>
> On Wed, Feb 15, 2012 at 7:20 AM, Jeff Eastman
> <jd...@windwardsolutions.com> wrote:
>> The error message describes what the algorithm can see: that there are no
>> initial clusters. The wiki documentation seems reasonably clear on the use
>> of -k
>> (https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering) to
>> obtain them by sampling the input dataset, otherwise -c needs to contain
>> clusters produced by the user.
>>
>>
>> On 2/14/12 8:04 PM, Lance Norskog wrote:
>>> Could the error message describe the user's mistake?
>>>
>>> On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman
>>> <jd...@windwardsolutions.com> wrote:
>>>> +1 bingo. K-Means is expecting you to provide the prior cluster centers
>>>> in
>>>> -c. If you want it to sample from your input data you need to add the -k
>>>> option to tell it how many you want. This has been a constant part of the
>>>> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if
>>>> you overlook this argument.
>>>>
>>>>
>>>>
>>>> On 2/14/12 8:56 AM, Suneel Marthi wrote:
>>>>> You are not specifying the number of clusters that need to be generated,
>>>>> try running again by specifying a -k<number of clusters> option. You
>>>>> also
>>>>> need to specify that you need clustering to be done with -cl.
>>>>>
>>>>> For example:-
>>>>>
>>>>> ./bin/mahout kmeans -i
>>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
>>>>> 10 -ow -k 20 -cl
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: qiang xu (Issue Comment Edited) (JIRA)<ji...@apache.org>
>>>>> To: dev@mahout.apache.org
>>>>> Sent: Tuesday, February 14, 2012 10:48 AM
>>>>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
>>>>> error
>>>>>
>>>>>
>>>>> [
>>>>>
>>>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675
>>>>> ]
>>>>>
>>>>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
>>>>> ----------------------------------------------------------
>>>>>
>>>>> This problem still exist in mahout 0.5 and 0.6
>>>>> ./bin/mahout kmeans -i
>>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>>>> -ow
>>>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>>>>
>>>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>>>> --endPhase=2147483647,
>>>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>>>> --maxIter=10, --method=mapreduce,
>>>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>>>> --startPhase=0, --tempDir=temp}
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans
>>>>> Distance:
>>>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
>>>>> Input
>>>>> Vectors: {}
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to
>>>>> process
>>>>> : 1
>>>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>>>> job_201202131515_0122
>>>>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
>>>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>>> at
>>>>>
>>>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>> at
>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>> It is really weired that cluster is gernerated
>>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>>> /user/root/examples/bin/work/
>>>>> Found 4 items
>>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:55
>>>>> /user/root/examples/bin/work/clusters
>>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:56
>>>>> /user/root/examples/bin/work/reuters-kmeans
>>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:29
>>>>> /user/root/examples/bin/work/reuters-out-seqdir
>>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:32
>>>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>>> /user/root/examples/bin/work/clusters
>>>>> Found 1 items
>>>>> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55
>>>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>>>
>>>>> I follow the guide in
>>>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>>> was (Author: skaterxu):
>>>>> ./bin/mahout kmeans -i
>>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>>>> -ow
>>>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>>>>
>>>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>>>> --endPhase=2147483647,
>>>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>>>> --maxIter=10, --method=mapreduce,
>>>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>>>> --startPhase=0, --tempDir=temp}
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans
>>>>> Distance:
>>>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
>>>>> Input
>>>>> Vectors: {}
>>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to
>>>>> process
>>>>> : 1
>>>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>>>> job_201202131515_0122
>>>>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
>>>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>>> at
>>>>>
>>>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>> at
>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>> It is really weired that cluster is gernerated
>>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>>> /user/root/examples/bin/work/
>>>>> Found 4 items
>>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:55
>>>>> /user/root/examples/bin/work/clusters
>>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:56
>>>>> /user/root/examples/bin/work/reuters-kmeans
>>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:29
>>>>> /user/root/examples/bin/work/reuters-out-seqdir
>>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:32
>>>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>>> /user/root/examples/bin/work/clusters
>>>>> Found 1 items
>>>>> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55
>>>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>>>
>>>>> I follow the guide in
>>>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>>>
>>>>>> Kmeans clustering error
>>>>>> -----------------------
>>>>>>
>>>>>> Key: MAHOUT-504
>>>>>> URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>>> Project: Mahout
>>>>>> Issue Type: Bug
>>>>>> Reporter: Zhen Guo
>>>>>> Assignee: Robin Anil
>>>>>> Fix For: 0.4
>>>>>>
>>>>>>
>>>>>> I tried the Kmeans algorithm on the Synthetic Control data. The
>>>>>> following
>>>>>> error appears. I tried the Canopy algorithm, it is fine. This error is
>>>>>> from
>>>>>> Mapper. I am using Trunk.
>>>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>>> at
>>>>>>
>>>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>> --
>>>>> This message is automatically generated by JIRA.
>>>>> If you think it was sent incorrectly, please contact your JIRA
>>>>> administrators:
>>>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>>>> For more information on JIRA, see:
>>>>> http://www.atlassian.com/software/jira
>>>
>>>
>
>
Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error
Posted by Lance Norskog <go...@gmail.com>.
Nobody reads the docs. If the program itself can do this, instead of
just barfing, it should. This is a case of Passive-Agressive Error
Reporting.
On Wed, Feb 15, 2012 at 7:20 AM, Jeff Eastman
<jd...@windwardsolutions.com> wrote:
> The error message describes what the algorithm can see: that there are no
> initial clusters. The wiki documentation seems reasonably clear on the use
> of -k
> (https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering) to
> obtain them by sampling the input dataset, otherwise -c needs to contain
> clusters produced by the user.
>
>
> On 2/14/12 8:04 PM, Lance Norskog wrote:
>>
>> Could the error message describe the user's mistake?
>>
>> On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman
>> <jd...@windwardsolutions.com> wrote:
>>>
>>> +1 bingo. K-Means is expecting you to provide the prior cluster centers
>>> in
>>> -c. If you want it to sample from your input data you need to add the -k
>>> option to tell it how many you want. This has been a constant part of the
>>> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if
>>> you overlook this argument.
>>>
>>>
>>>
>>> On 2/14/12 8:56 AM, Suneel Marthi wrote:
>>>>
>>>> You are not specifying the number of clusters that need to be generated,
>>>> try running again by specifying a -k<number of clusters> option. You
>>>> also
>>>> need to specify that you need clustering to be done with -cl.
>>>>
>>>> For example:-
>>>>
>>>> ./bin/mahout kmeans -i
>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
>>>> 10 -ow -k 20 -cl
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: qiang xu (Issue Comment Edited) (JIRA)<ji...@apache.org>
>>>> To: dev@mahout.apache.org
>>>> Sent: Tuesday, February 14, 2012 10:48 AM
>>>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
>>>> error
>>>>
>>>>
>>>> [
>>>>
>>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675
>>>> ]
>>>>
>>>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
>>>> ----------------------------------------------------------
>>>>
>>>> This problem still exist in mahout 0.5 and 0.6
>>>> ./bin/mahout kmeans -i
>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>>> -ow
>>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>>>
>>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>>> --endPhase=2147483647,
>>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>>> --maxIter=10, --method=mapreduce,
>>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>>> --startPhase=0, --tempDir=temp}
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans
>>>> Distance:
>>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
>>>> Input
>>>> Vectors: {}
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to
>>>> process
>>>> : 1
>>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>>> job_201202131515_0122
>>>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
>>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>> at
>>>>
>>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>> It is really weired that cluster is gernerated
>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>> /user/root/examples/bin/work/
>>>> Found 4 items
>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:55
>>>> /user/root/examples/bin/work/clusters
>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:56
>>>> /user/root/examples/bin/work/reuters-kmeans
>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:29
>>>> /user/root/examples/bin/work/reuters-out-seqdir
>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:32
>>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>> /user/root/examples/bin/work/clusters
>>>> Found 1 items
>>>> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55
>>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>>
>>>> I follow the guide in
>>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>> was (Author: skaterxu):
>>>> ./bin/mahout kmeans -i
>>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>>> -ow
>>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>>>
>>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>>> --endPhase=2147483647,
>>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>>> --maxIter=10, --method=mapreduce,
>>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>>> --startPhase=0, --tempDir=temp}
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans
>>>> Distance:
>>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
>>>> Input
>>>> Vectors: {}
>>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to
>>>> process
>>>> : 1
>>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>>> job_201202131515_0122
>>>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
>>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>> at
>>>>
>>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>> It is really weired that cluster is gernerated
>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>> /user/root/examples/bin/work/
>>>> Found 4 items
>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:55
>>>> /user/root/examples/bin/work/clusters
>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:56
>>>> /user/root/examples/bin/work/reuters-kmeans
>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:29
>>>> /user/root/examples/bin/work/reuters-out-seqdir
>>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:32
>>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>>> /user/root/examples/bin/work/clusters
>>>> Found 1 items
>>>> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55
>>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>>
>>>> I follow the guide in
>>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>>
>>>>> Kmeans clustering error
>>>>> -----------------------
>>>>>
>>>>> Key: MAHOUT-504
>>>>> URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>> Project: Mahout
>>>>> Issue Type: Bug
>>>>> Reporter: Zhen Guo
>>>>> Assignee: Robin Anil
>>>>> Fix For: 0.4
>>>>>
>>>>>
>>>>> I tried the Kmeans algorithm on the Synthetic Control data. The
>>>>> following
>>>>> error appears. I tried the Canopy algorithm, it is fine. This error is
>>>>> from
>>>>> Mapper. I am using Trunk.
>>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>> at
>>>>>
>>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>
>>>> --
>>>> This message is automatically generated by JIRA.
>>>> If you think it was sent incorrectly, please contact your JIRA
>>>> administrators:
>>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>>> For more information on JIRA, see:
>>>> http://www.atlassian.com/software/jira
>>
>>
>>
>
--
Lance Norskog
goksron@gmail.com
Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
error
Posted by Jeff Eastman <jd...@windwardsolutions.com>.
The error message describes what the algorithm can see: that there are
no initial clusters. The wiki documentation seems reasonably clear on
the use of -k
(https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering)
to obtain them by sampling the input dataset, otherwise -c needs to
contain clusters produced by the user.
On 2/14/12 8:04 PM, Lance Norskog wrote:
> Could the error message describe the user's mistake?
>
> On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman
> <jd...@windwardsolutions.com> wrote:
>> +1 bingo. K-Means is expecting you to provide the prior cluster centers in
>> -c. If you want it to sample from your input data you need to add the -k
>> option to tell it how many you want. This has been a constant part of the
>> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if
>> you overlook this argument.
>>
>>
>>
>> On 2/14/12 8:56 AM, Suneel Marthi wrote:
>>> You are not specifying the number of clusters that need to be generated,
>>> try running again by specifying a -k<number of clusters> option. You also
>>> need to specify that you need clustering to be done with -cl.
>>>
>>> For example:-
>>>
>>> ./bin/mahout kmeans -i
>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
>>> 10 -ow -k 20 -cl
>>>
>>>
>>>
>>> ________________________________
>>> From: qiang xu (Issue Comment Edited) (JIRA)<ji...@apache.org>
>>> To: dev@mahout.apache.org
>>> Sent: Tuesday, February 14, 2012 10:48 AM
>>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
>>> error
>>>
>>>
>>> [
>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675
>>> ]
>>>
>>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
>>> ----------------------------------------------------------
>>>
>>> This problem still exist in mahout 0.5 and 0.6
>>> ./bin/mahout kmeans -i
>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>> -ow
>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>> --endPhase=2147483647,
>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>> --maxIter=10, --method=mapreduce,
>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>> --startPhase=0, --tempDir=temp}
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance:
>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>> job_201202131515_0122
>>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>> at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> It is really weired that cluster is gernerated
>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>> /user/root/examples/bin/work/
>>> Found 4 items
>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:55
>>> /user/root/examples/bin/work/clusters
>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:56
>>> /user/root/examples/bin/work/reuters-kmeans
>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:29
>>> /user/root/examples/bin/work/reuters-out-seqdir
>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:32
>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>> /user/root/examples/bin/work/clusters
>>> Found 1 items
>>> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55
>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>
>>> I follow the guide in
>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>> was (Author: skaterxu):
>>> ./bin/mahout kmeans -i
>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>> -ow
>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>>> --endPhase=2147483647,
>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>>> --maxIter=10, --method=mapreduce,
>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>>> --startPhase=0, --tempDir=temp}
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance:
>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process
>>> : 1
>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>>> job_201202131515_0122
>>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>> at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> It is really weired that cluster is gernerated
>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>> /user/root/examples/bin/work/
>>> Found 4 items
>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:55
>>> /user/root/examples/bin/work/clusters
>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:56
>>> /user/root/examples/bin/work/reuters-kmeans
>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:29
>>> /user/root/examples/bin/work/reuters-out-seqdir
>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:32
>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>>> /user/root/examples/bin/work/clusters
>>> Found 1 items
>>> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55
>>> /user/root/examples/bin/work/clusters/part-randomSeed
>>>
>>> I follow the guide in
>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>>
>>>> Kmeans clustering error
>>>> -----------------------
>>>>
>>>> Key: MAHOUT-504
>>>> URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>> Project: Mahout
>>>> Issue Type: Bug
>>>> Reporter: Zhen Guo
>>>> Assignee: Robin Anil
>>>> Fix For: 0.4
>>>>
>>>>
>>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>>> Mapper. I am using Trunk.
>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: Cluster is empty!
>>>> at
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>> --
>>> This message is automatically generated by JIRA.
>>> If you think it was sent incorrectly, please contact your JIRA
>>> administrators:
>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error
Posted by Lance Norskog <go...@gmail.com>.
Could the error message describe the user's mistake?
On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman
<jd...@windwardsolutions.com> wrote:
> +1 bingo. K-Means is expecting you to provide the prior cluster centers in
> -c. If you want it to sample from your input data you need to add the -k
> option to tell it how many you want. This has been a constant part of the
> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if
> you overlook this argument.
>
>
>
> On 2/14/12 8:56 AM, Suneel Marthi wrote:
>>
>> You are not specifying the number of clusters that need to be generated,
>> try running again by specifying a -k<number of clusters> option. You also
>> need to specify that you need clustering to be done with -cl.
>>
>> For example:-
>>
>> ./bin/mahout kmeans -i
>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
>> 10 -ow -k 20 -cl
>>
>>
>>
>> ________________________________
>> From: qiang xu (Issue Comment Edited) (JIRA)<ji...@apache.org>
>> To: dev@mahout.apache.org
>> Sent: Tuesday, February 14, 2012 10:48 AM
>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
>> error
>>
>>
>> [
>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675
>> ]
>>
>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
>> ----------------------------------------------------------
>>
>> This problem still exist in mahout 0.5 and 0.6
>> ./bin/mahout kmeans -i
>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>> -ow
>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>> --endPhase=2147483647,
>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>> --maxIter=10, --method=mapreduce,
>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>> --startPhase=0, --tempDir=temp}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance:
>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>> job_201202131515_0122
>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>> at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> It is really weired that cluster is gernerated
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/
>> Found 4 items
>> drwxr-xr-x - root supergroup 0 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters
>> drwxr-xr-x - root supergroup 0 2012-02-14 20:56
>> /user/root/examples/bin/work/reuters-kmeans
>> drwxr-xr-x - root supergroup 0 2012-02-14 20:29
>> /user/root/examples/bin/work/reuters-out-seqdir
>> drwxr-xr-x - root supergroup 0 2012-02-14 20:32
>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/clusters
>> Found 1 items
>> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters/part-randomSeed
>>
>> I follow the guide in
>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>> was (Author: skaterxu):
>> ./bin/mahout kmeans -i
>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>> -ow
>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>> --endPhase=2147483647,
>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>> --maxIter=10, --method=mapreduce,
>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>> --startPhase=0, --tempDir=temp}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance:
>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>> job_201202131515_0122
>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>> at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> It is really weired that cluster is gernerated
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/
>> Found 4 items
>> drwxr-xr-x - root supergroup 0 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters
>> drwxr-xr-x - root supergroup 0 2012-02-14 20:56
>> /user/root/examples/bin/work/reuters-kmeans
>> drwxr-xr-x - root supergroup 0 2012-02-14 20:29
>> /user/root/examples/bin/work/reuters-out-seqdir
>> drwxr-xr-x - root supergroup 0 2012-02-14 20:32
>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/clusters
>> Found 1 items
>> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters/part-randomSeed
>>
>> I follow the guide in
>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>
>>> Kmeans clustering error
>>> -----------------------
>>>
>>> Key: MAHOUT-504
>>> URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>> Project: Mahout
>>> Issue Type: Bug
>>> Reporter: Zhen Guo
>>> Assignee: Robin Anil
>>> Fix For: 0.4
>>>
>>>
>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>> Mapper. I am using Trunk.
>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>> at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators:
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
--
Lance Norskog
goksron@gmail.com
Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
error
Posted by Jeff Eastman <jd...@windwardsolutions.com>.
+1 bingo. K-Means is expecting you to provide the prior cluster centers
in -c. If you want it to sample from your input data you need to add the
-k option to tell it how many you want. This has been a constant part of
the api for some time, hence 0.4, 0.5 and 0.6 will all give the same
error if you overlook this argument.
On 2/14/12 8:56 AM, Suneel Marthi wrote:
> You are not specifying the number of clusters that need to be generated, try running again by specifying a -k<number of clusters> option. You also need to specify that you need clustering to be done with -cl.
>
> For example:-
>
> ./bin/mahout kmeans -i
> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
> 10 -ow -k 20 -cl
>
>
>
> ________________________________
> From: qiang xu (Issue Comment Edited) (JIRA)<ji...@apache.org>
> To: dev@mahout.apache.org
> Sent: Tuesday, February 14, 2012 10:48 AM
> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error
>
>
> [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 ]
>
> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
> ----------------------------------------------------------
>
> This problem still exist in mahout 0.5 and 0.6
> ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow
> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
> 12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> It is really weired that cluster is gernerated
> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
> Found 4 items
> drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
> drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
> drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
> drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
> Found 1 items
> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed
>
> I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>
> was (Author: skaterxu):
> ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow
> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
> 12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> It is really weired that cluster is gernerated
> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
> Found 4 items
> drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
> drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
> drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
> drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
> Found 1 items
> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed
>
> I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>
>> Kmeans clustering error
>> -----------------------
>>
>> Key: MAHOUT-504
>> URL: https://issues.apache.org/jira/browse/MAHOUT-504
>> Project: Mahout
>> Issue Type: Bug
>> Reporter: Zhen Guo
>> Assignee: Robin Anil
>> Fix For: 0.4
>>
>>
>> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: Cluster is empty!
>> at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error
Posted by Suneel Marthi <su...@yahoo.com>.
You are not specifying the number of clusters that need to be generated, try running again by specifying a -k <number of clusters> option. You also need to specify that you need clustering to be done with -cl.
For example:-
./bin/mahout kmeans -i
./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
10 -ow -k 20 -cl
________________________________
From: qiang xu (Issue Comment Edited) (JIRA) <ji...@apache.org>
To: dev@mahout.apache.org
Sent: Tuesday, February 14, 2012 10:48 AM
Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error
[ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 ]
qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
----------------------------------------------------------
This problem still exist in mahout 0.5 and 0.6
./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow
Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
It is really weired that cluster is gernerated
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
Found 4 items
drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
Found 1 items
-rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed
I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
was (Author: skaterxu):
./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow
Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1
12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122
12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0%
12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
It is really weired that cluster is gernerated
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/
Found 4 items
drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters
drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans
drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir
drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters
Found 1 items
-rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed
I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html
> Kmeans clustering error
> -----------------------
>
> Key: MAHOUT-504
> URL: https://issues.apache.org/jira/browse/MAHOUT-504
> Project: Mahout
> Issue Type: Bug
> Reporter: Zhen Guo
> Assignee: Robin Anil
> Fix For: 0.4
>
>
> I tried the Kmeans algorithm on the Synthetic Control data. The following error appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I am using Trunk.
> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : attempt_201008261432_1324_m_000000_0, Status : FAILED
> java.lang.IllegalStateException: Cluster is empty!
> at org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira