You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Ahmad Ammari <am...@gmail.com> on 2011/11/16 10:47:54 UTC

NewsKMeansClustering does not find any clusters!

Hello,

I am practicing the mahout examples in the clustering part of the book
"Mahout in action", particularly chapter 9. In Section 9.1.4, I am trying
to run the class NewsKMeansClustering, which I got its source code from the
companion source code files. What I understood is that the input directory
"inputDir" should contain the input documents in SequenceFile format.
Therefore, I tried to make the "reuters-seqfiles" directory that we
generated using the seqdirectory program that runs in the mahout launcher
in chapter 8 (page 139). I then ran the NewsKMeansClustering, which started
to run fine, until I get a java.lang.IllegalStateException exception,
saying that No clusters found, as follows:

java.lang.IllegalStateException: No clusters found. Check your -c path.
at
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: map 0% reduce 0%
16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Job complete: job_local_0010
16-Nov-2011 00:49:14 org.apache.hadoop.mapred.Counters log
INFO: Counters: 0
Exception in thread "main" java.lang.InterruptedException: K-Means
Iteration failed processing reutersClusters/canopy-centroids/clusters-0
at
org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:363)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:310)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:237)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:152)
at clusterer.NewsKMeansClustering.main(NewsKMeansClustering.java:81)
------------------------------------------------------------------------
BUILD FAILURE
------------------------------------------------------------------------
Total time: 15.391s
Finished at: Wed Nov 16 00:49:14 GMT 2011
Final Memory: 10M/150M
------------------------------------------------------------------------
Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec
(default-cli) on project mahout-examples: Command execution failed. Process
exited with an error: 1(Exit value: 1) -> [Help 1]

To see the full stack trace of the errors, re-run Maven with the -e switch.
Re-run Maven using the -X switch to enable full debug logging.

For more information about the errors and possible solutions, please read
the following articles:
[Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

What does it mean that no cluster found?! Is the input directory wrong? If
so, what input should I give the class? I tried to change the canopy
thresholds (250, 120) to some other numbers, tried also changing the
EuclideanDistanceMeasure for the canopy clustering to
CosineDistanceMeasure, with no use.

Many thanks in advance,
Ahmad

Re: NewsKMeansClustering does not find any clusters!

Posted by Ahmad Ammari <am...@gmail.com>.
Hi Grant,

I am running the NewsKMeansClustering Class from NetBeans (Run -> Run
File). I did not change anything in the class code except the name of the
input directory, so the class can see the dataset that I want to cluster.
So, I changed the statement:

String inputDir = "inputDir";

to:

String inputDir = "reuters-seqfiles";

The directory (reuters-seqfiles) contains the dataset in SequenceFile
format. This directory and its data are achieved by running the
seqdirectory program using the mahout launcher (bin/mahout seqdirectory).

Do you want me to post for you the code of the NewsKMeansClustering Class
from the book, or you already have it ?

Thanks,
Ahmad

On Thu, Nov 17, 2011 at 4:57 PM, Grant Ingersoll <gs...@apache.org>wrote:

> What command did you run?
>
> On Nov 16, 2011, at 4:47 AM, Ahmad Ammari wrote:
>
> > Hello,
> >
> > I am practicing the mahout examples in the clustering part of the book
> > "Mahout in action", particularly chapter 9. In Section 9.1.4, I am trying
> > to run the class NewsKMeansClustering, which I got its source code from
> the
> > companion source code files. What I understood is that the input
> directory
> > "inputDir" should contain the input documents in SequenceFile format.
> > Therefore, I tried to make the "reuters-seqfiles" directory that we
> > generated using the seqdirectory program that runs in the mahout launcher
> > in chapter 8 (page 139). I then ran the NewsKMeansClustering, which
> started
> > to run fine, until I get a java.lang.IllegalStateException exception,
> > saying that No clusters found, as follows:
> >
> > java.lang.IllegalStateException: No clusters found. Check your -c path.
> > at
> >
> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
> > INFO: map 0% reduce 0%
> > 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
> > INFO: Job complete: job_local_0010
> > 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.Counters log
> > INFO: Counters: 0
> > Exception in thread "main" java.lang.InterruptedException: K-Means
> > Iteration failed processing reutersClusters/canopy-centroids/clusters-0
> > at
> >
> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:363)
> > at
> >
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:310)
> > at
> >
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:237)
> > at
> >
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:152)
> > at clusterer.NewsKMeansClustering.main(NewsKMeansClustering.java:81)
> > ------------------------------------------------------------------------
> > BUILD FAILURE
> > ------------------------------------------------------------------------
> > Total time: 15.391s
> > Finished at: Wed Nov 16 00:49:14 GMT 2011
> > Final Memory: 10M/150M
> > ------------------------------------------------------------------------
> > Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec
> > (default-cli) on project mahout-examples: Command execution failed.
> Process
> > exited with an error: 1(Exit value: 1) -> [Help 1]
> >
> > To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> > Re-run Maven using the -X switch to enable full debug logging.
> >
> > For more information about the errors and possible solutions, please read
> > the following articles:
> > [Help 1]
> > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> >
> > What does it mean that no cluster found?! Is the input directory wrong?
> If
> > so, what input should I give the class? I tried to change the canopy
> > thresholds (250, 120) to some other numbers, tried also changing the
> > EuclideanDistanceMeasure for the canopy clustering to
> > CosineDistanceMeasure, with no use.
> >
> > Many thanks in advance,
> > Ahmad
>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>
>

Re: NewsKMeansClustering does not find any clusters!

Posted by Grant Ingersoll <gs...@apache.org>.
What command did you run?  

On Nov 16, 2011, at 4:47 AM, Ahmad Ammari wrote:

> Hello,
> 
> I am practicing the mahout examples in the clustering part of the book
> "Mahout in action", particularly chapter 9. In Section 9.1.4, I am trying
> to run the class NewsKMeansClustering, which I got its source code from the
> companion source code files. What I understood is that the input directory
> "inputDir" should contain the input documents in SequenceFile format.
> Therefore, I tried to make the "reuters-seqfiles" directory that we
> generated using the seqdirectory program that runs in the mahout launcher
> in chapter 8 (page 139). I then ran the NewsKMeansClustering, which started
> to run fine, until I get a java.lang.IllegalStateException exception,
> saying that No clusters found, as follows:
> 
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> at
> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
> INFO: map 0% reduce 0%
> 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
> INFO: Job complete: job_local_0010
> 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.Counters log
> INFO: Counters: 0
> Exception in thread "main" java.lang.InterruptedException: K-Means
> Iteration failed processing reutersClusters/canopy-centroids/clusters-0
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:363)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:310)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:237)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:152)
> at clusterer.NewsKMeansClustering.main(NewsKMeansClustering.java:81)
> ------------------------------------------------------------------------
> BUILD FAILURE
> ------------------------------------------------------------------------
> Total time: 15.391s
> Finished at: Wed Nov 16 00:49:14 GMT 2011
> Final Memory: 10M/150M
> ------------------------------------------------------------------------
> Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec
> (default-cli) on project mahout-examples: Command execution failed. Process
> exited with an error: 1(Exit value: 1) -> [Help 1]
> 
> To see the full stack trace of the errors, re-run Maven with the -e switch.
> Re-run Maven using the -X switch to enable full debug logging.
> 
> For more information about the errors and possible solutions, please read
> the following articles:
> [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> 
> What does it mean that no cluster found?! Is the input directory wrong? If
> so, what input should I give the class? I tried to change the canopy
> thresholds (250, 120) to some other numbers, tried also changing the
> EuclideanDistanceMeasure for the canopy clustering to
> CosineDistanceMeasure, with no use.
> 
> Many thanks in advance,
> Ahmad

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Re: NewsKMeansClustering does not find any clusters!

Posted by Ahmad Ammari <am...@gmail.com>.
Hi Jeff,

Can you please elaborate what is meant by the -c path? I am running the
Class NewsKMeansClustering normally from NetBeans (not from a command-line
shell neither from mahout launcher script). So, I am not including any
options with the run.

Thanks,
Ahmad

On Wed, Nov 16, 2011 at 5:22 PM, Jeff Eastman <je...@narus.com> wrote:

> K-means is attempting to load your initial clusters and is not finding
> any. Have you checked your -c path? You can also add -xm sequential so you
> can run the sequential algorithm. This allows you to use a debugger to
> verify your paths.
>
> -----Original Message-----
> From: Ahmad Ammari [mailto:ammariect@gmail.com]
> Sent: Wednesday, November 16, 2011 7:19 AM
> To: user@mahout.apache.org
> Subject: NewsKMeansClustering does not find any clusters!
>
> Hello,
>
> I am practicing the mahout examples in the clustering part of the book
> "Mahout in action", particularly chapter 9. In Section 9.1.4, I am trying
> to run the class NewsKMeansClustering, which I got its source code from the
> companion source code files. What I understood is that the input directory
> "inputDir" should contain the input documents in SequenceFile format.
> Therefore, I tried to make the "reuters-seqfiles" directory that we
> generated using the seqdirectory program that runs in the mahout launcher
> in chapter 8 (page 139). I then ran the NewsKMeansClustering, which started
> to run fine, until I get a java.lang.IllegalStateException exception,
> saying that No clusters found, as follows:
>
> java.lang.IllegalStateException: No clusters found. Check your -c path.
> at
>
> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
> INFO: map 0% reduce 0%
> 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
> INFO: Job complete: job_local_0010
> 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.Counters log
> INFO: Counters: 0
> Exception in thread "main" java.lang.InterruptedException: K-Means
> Iteration failed processing reutersClusters/canopy-centroids/clusters-0
> at
>
> org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:363)
> at
>
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:310)
> at
>
> org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:237)
> at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:152)
> at clusterer.NewsKMeansClustering.main(NewsKMeansClustering.java:81)
> ------------------------------------------------------------------------
> BUILD FAILURE
> ------------------------------------------------------------------------
> Total time: 15.391s
> Finished at: Wed Nov 16 00:49:14 GMT 2011
> Final Memory: 10M/150M
> ------------------------------------------------------------------------
> Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec
> (default-cli) on project mahout-examples: Command execution failed. Process
> exited with an error: 1(Exit value: 1) -> [Help 1]
>
> To see the full stack trace of the errors, re-run Maven with the -e switch.
> Re-run Maven using the -X switch to enable full debug logging.
>
> For more information about the errors and possible solutions, please read
> the following articles:
> [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>
> What does it mean that no cluster found?!
>
> Is the input directory wrong? If so, what input should I give the class?
>
> I tried to change the canopy thresholds (250, 120) to some other numbers,
> tried also changing the EuclideanDistanceMeasure for the canopy clustering
> to CosineDistanceMeasure, with no use.
>
> Many thanks in advance,
> Ahmad
>

RE: NewsKMeansClustering does not find any clusters!

Posted by Jeff Eastman <je...@Narus.com>.
K-means is attempting to load your initial clusters and is not finding any. Have you checked your -c path? You can also add -xm sequential so you can run the sequential algorithm. This allows you to use a debugger to verify your paths.

-----Original Message-----
From: Ahmad Ammari [mailto:ammariect@gmail.com] 
Sent: Wednesday, November 16, 2011 7:19 AM
To: user@mahout.apache.org
Subject: NewsKMeansClustering does not find any clusters!

Hello,

I am practicing the mahout examples in the clustering part of the book
"Mahout in action", particularly chapter 9. In Section 9.1.4, I am trying
to run the class NewsKMeansClustering, which I got its source code from the
companion source code files. What I understood is that the input directory
"inputDir" should contain the input documents in SequenceFile format.
Therefore, I tried to make the "reuters-seqfiles" directory that we
generated using the seqdirectory program that runs in the mahout launcher
in chapter 8 (page 139). I then ran the NewsKMeansClustering, which started
to run fine, until I get a java.lang.IllegalStateException exception,
saying that No clusters found, as follows:

java.lang.IllegalStateException: No clusters found. Check your -c path.
at
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: map 0% reduce 0%
16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Job complete: job_local_0010
16-Nov-2011 00:49:14 org.apache.hadoop.mapred.Counters log
INFO: Counters: 0
Exception in thread "main" java.lang.InterruptedException: K-Means
Iteration failed processing reutersClusters/canopy-centroids/clusters-0
at
org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:363)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:310)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:237)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:152)
at clusterer.NewsKMeansClustering.main(NewsKMeansClustering.java:81)
------------------------------------------------------------------------
BUILD FAILURE
------------------------------------------------------------------------
Total time: 15.391s
Finished at: Wed Nov 16 00:49:14 GMT 2011
Final Memory: 10M/150M
------------------------------------------------------------------------
Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec
(default-cli) on project mahout-examples: Command execution failed. Process
exited with an error: 1(Exit value: 1) -> [Help 1]

To see the full stack trace of the errors, re-run Maven with the -e switch.
Re-run Maven using the -X switch to enable full debug logging.

For more information about the errors and possible solutions, please read
the following articles:
[Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

What does it mean that no cluster found?!

Is the input directory wrong? If so, what input should I give the class?

I tried to change the canopy thresholds (250, 120) to some other numbers,
tried also changing the EuclideanDistanceMeasure for the canopy clustering
to CosineDistanceMeasure, with no use.

Many thanks in advance,
Ahmad

NewsKMeansClustering does not find any clusters!

Posted by Ahmad Ammari <am...@gmail.com>.
Hello,

I am practicing the mahout examples in the clustering part of the book
"Mahout in action", particularly chapter 9. In Section 9.1.4, I am trying
to run the class NewsKMeansClustering, which I got its source code from the
companion source code files. What I understood is that the input directory
"inputDir" should contain the input documents in SequenceFile format.
Therefore, I tried to make the "reuters-seqfiles" directory that we
generated using the seqdirectory program that runs in the mahout launcher
in chapter 8 (page 139). I then ran the NewsKMeansClustering, which started
to run fine, until I get a java.lang.IllegalStateException exception,
saying that No clusters found, as follows:

java.lang.IllegalStateException: No clusters found. Check your -c path.
at
org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: map 0% reduce 0%
16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Job complete: job_local_0010
16-Nov-2011 00:49:14 org.apache.hadoop.mapred.Counters log
INFO: Counters: 0
Exception in thread "main" java.lang.InterruptedException: K-Means
Iteration failed processing reutersClusters/canopy-centroids/clusters-0
at
org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:363)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:310)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:237)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:152)
at clusterer.NewsKMeansClustering.main(NewsKMeansClustering.java:81)
------------------------------------------------------------------------
BUILD FAILURE
------------------------------------------------------------------------
Total time: 15.391s
Finished at: Wed Nov 16 00:49:14 GMT 2011
Final Memory: 10M/150M
------------------------------------------------------------------------
Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec
(default-cli) on project mahout-examples: Command execution failed. Process
exited with an error: 1(Exit value: 1) -> [Help 1]

To see the full stack trace of the errors, re-run Maven with the -e switch.
Re-run Maven using the -X switch to enable full debug logging.

For more information about the errors and possible solutions, please read
the following articles:
[Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

What does it mean that no cluster found?!

Is the input directory wrong? If so, what input should I give the class?

I tried to change the canopy thresholds (250, 120) to some other numbers,
tried also changing the EuclideanDistanceMeasure for the canopy clustering
to CosineDistanceMeasure, with no use.

Many thanks in advance,
Ahmad