You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Raghuveer <al...@yahoo.com.INVALID> on 2015/03/10 06:27:08 UTC

mahout failing with -c as required option

Hi All,
I am trying to run the command:
./mahout kmeans -i hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000 -o  hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k 25 -xm mapreduce
Since i dont have any clusters yet to give it as an input i can remove it is what forums suggested. But now i get the error 

Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option --clusters
Missing required option --clusters                                              
Usage:                                                                          
 [--input <input> --output <output> --distanceMeasure <distanceMeasure>         
--clusters <clusters> --numClusters <k> --randomSeed <randomSeed1>              
[<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter <maxIter>   
--overwrite --clustering --method <method> --outlierThreshold                   
<outlierThreshold> --help --tempDir <tempDir> --startPhase <startPhase>         
--endPhase <endPhase>]                                                          
--clusters (-c) clusters    The input centroids, as Vectors.  Must be a         
                            SequenceFile of Writable, Cluster/Canopy.  If k is  
                            also specified, then a random set of vectors will   
                            be selected and written out to this path first      
15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes: 0.006166666666666667)
Kindly help me out.
Thanks



Re: mahout failing with -c as required option

Posted by Raghuveer <al...@yahoo.com.INVALID>.
I see the error below:
Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
15/03/10 11:50:20 INFO common.AbstractJob: Command line arguments: {--clustering=null, --clusters=[hdfs://master:54310/user/netlog/upload/mahoutoutput], --convergenceDelta=[0.5], --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], --endPhase=[2147483647], --input=[hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000], --maxIter=[5], --method=[mapreduce], --numClusters=[25], --output=[hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
15/03/10 11:50:21 INFO common.HadoopUtil: Deleting hdfs://master:54310/user/netlog/upload/mahoutoutput
15/03/10 11:50:21 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
15/03/10 11:50:21 INFO compress.CodecPool: Got brand-new compressor [.deflate]
15/03/10 11:50:21 INFO kmeans.RandomSeedGenerator: Wrote 25 Klusters to hdfs://master:54310/user/netlog/upload/mahoutoutput/part-randomSeed
15/03/10 11:50:21 INFO kmeans.KMeansDriver: Input: hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000 Clusters In: hdfs://master:54310/user/netlog/upload/mahoutoutput/part-randomSeed Out: hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
15/03/10 11:50:21 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 5
15/03/10 11:50:21 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
Exception in thread "main" java.lang.IllegalStateException: No input clusters found in hdfs://master:54310/user/netlog/upload/mahoutoutput/part-randomSeed. Check your -c argument.
    at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:213)

 

     On Tuesday, March 10, 2015 11:53 AM, Raghuveer <al...@yahoo.com.INVALID> wrote:
   

 I see the error below: 

    On Tuesday, March 10, 2015 11:45 AM, Suneel Marthi <su...@gmail.com> wrote:
  

 Try

./mahout kmeans -i http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000 -o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -c <some-folder> -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow -cl -k 25

I don't have a machine before me, so no way to try this out. 

But IIRC the way this works is :

a) u specify an initial seed of centroids via -c , u then don't need to specify k, since the # of centroids specified as seed would be the k

b) u let the algorithm choose random centroids by specifying -k, it needs -c to write the random centroids to hence -c is needed with -k.






On Tue, Mar 10, 2015 at 2:09 AM, Raghuveer <al...@yahoo.com> wrote:

ok so if -c is required then how can i give it or atleast is there a way to remove -k itself?
./mahout kmeans -i http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000 -o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow -cl -k 25
and 

./mahout kmeans -i http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000 -o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow -cl
both give the same exception still. Kindly suggest.
 

    On Tuesday, March 10, 2015 11:35 AM, Suneel Marthi <su...@gmail.com> wrote:
  

 Oops! I meant to say that -c is required for the random centroid initialization if -k is specified.
It initializes k random centroids in the folder specified by -c. so yes -c is required.

On Tue, Mar 10, 2015 at 1:42 AM, Raghuveer <al...@yahoo.com.invalid> wrote:

No i have removed the -c option now so i get the mentioned exception that -c is mandatory.


     On Tuesday, March 10, 2015 11:06 AM, Suneel Marthi <su...@gmail.com> wrote:


 R u still specifying the -c option, its only needed if u have initial
centroids to launch the KMEans from otherwise KMeans picks random centroids.

Also CosineDistanceMeasure doesn't make sense with kMeans which is in
Euclidean space -try using SquaredEuclidean or Euclidean distances.

On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <al...@yahoo.com.invalid>
wrote:

> Hi All,
> I am trying to run the command:
> ./mahout kmeans -i
> hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> -o
> hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
> -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k 25
> -xm mapreduce
> Since i dont have any clusters yet to give it as an input i can remove it
> is what forums suggested. But now i get the error
>
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option
> --clusters
> Missing required option
> --clusters
>
> Usage:
>  [--input <input> --output <output> --distanceMeasure
> <distanceMeasure>
> --clusters <clusters> --numClusters <k> --randomSeed
> <randomSeed1>
> [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter
> <maxIter>
> --overwrite --clustering --method <method>
> --outlierThreshold
> <outlierThreshold> --help --tempDir <tempDir> --startPhase
> <startPhase>
> --endPhase
> <endPhase>]
> --clusters (-c) clusters    The input centroids, as Vectors.  Must be
> a
>                            SequenceFile of Writable, Cluster/Canopy.  If
> k is
>                            also specified, then a random set of vectors
> will
>                            be selected and written out to this path
> first
> 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes:
> 0.006166666666666667)
> Kindly help me out.
> Thanks
>
>
>


   



    





   

Re: mahout failing with -c as required option

Posted by Raghuveer <al...@yahoo.com.INVALID>.
I see the error below: 

     On Tuesday, March 10, 2015 11:45 AM, Suneel Marthi <su...@gmail.com> wrote:
   

 Try

./mahout kmeans -i http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000 -o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -c <some-folder> -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow -cl -k 25

I don't have a machine before me, so no way to try this out. 

But IIRC the way this works is :

a) u specify an initial seed of centroids via -c , u then don't need to specify k, since the # of centroids specified as seed would be the k

b) u let the algorithm choose random centroids by specifying -k, it needs -c to write the random centroids to hence -c is needed with -k.






On Tue, Mar 10, 2015 at 2:09 AM, Raghuveer <al...@yahoo.com> wrote:

ok so if -c is required then how can i give it or atleast is there a way to remove -k itself?
./mahout kmeans -i http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000 -o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow -cl -k 25
and 

./mahout kmeans -i http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000 -o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow -cl
both give the same exception still. Kindly suggest.
 

     On Tuesday, March 10, 2015 11:35 AM, Suneel Marthi <su...@gmail.com> wrote:
   

 Oops! I meant to say that -c is required for the random centroid initialization if -k is specified.
It initializes k random centroids in the folder specified by -c. so yes -c is required.

On Tue, Mar 10, 2015 at 1:42 AM, Raghuveer <al...@yahoo.com.invalid> wrote:

No i have removed the -c option now so i get the mentioned exception that -c is mandatory.


     On Tuesday, March 10, 2015 11:06 AM, Suneel Marthi <su...@gmail.com> wrote:


 R u still specifying the -c option, its only needed if u have initial
centroids to launch the KMEans from otherwise KMeans picks random centroids.

Also CosineDistanceMeasure doesn't make sense with kMeans which is in
Euclidean space -try using SquaredEuclidean or Euclidean distances.

On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <al...@yahoo.com.invalid>
wrote:

> Hi All,
> I am trying to run the command:
> ./mahout kmeans -i
> hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> -o
> hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
> -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k 25
> -xm mapreduce
> Since i dont have any clusters yet to give it as an input i can remove it
> is what forums suggested. But now i get the error
>
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option
> --clusters
> Missing required option
> --clusters
>
> Usage:
>  [--input <input> --output <output> --distanceMeasure
> <distanceMeasure>
> --clusters <clusters> --numClusters <k> --randomSeed
> <randomSeed1>
> [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter
> <maxIter>
> --overwrite --clustering --method <method>
> --outlierThreshold
> <outlierThreshold> --help --tempDir <tempDir> --startPhase
> <startPhase>
> --endPhase
> <endPhase>]
> --clusters (-c) clusters    The input centroids, as Vectors.  Must be
> a
>                            SequenceFile of Writable, Cluster/Canopy.  If
> k is
>                            also specified, then a random set of vectors
> will
>                            be selected and written out to this path
> first
> 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes:
> 0.006166666666666667)
> Kindly help me out.
> Thanks
>
>
>


   



    



   

Re: mahout failing with -c as required option

Posted by Suneel Marthi <su...@gmail.com>.
Try

./mahout kmeans -i
http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000
-o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -c
<some-folder> -dm
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow
-cl -k 25

I don't have a machine before me, so no way to try this out.

But IIRC the way this works is :

a) u specify an initial seed of centroids via -c , u then don't need to
specify k, since the # of centroids specified as seed would be the k

b) u let the algorithm choose random centroids by specifying -k, it needs
-c to write the random centroids to hence -c is needed with -k.






On Tue, Mar 10, 2015 at 2:09 AM, Raghuveer <al...@yahoo.com> wrote:

> ok so if -c is required then how can i give it or atleast is there a way
> to remove -k itself?
>
> ./mahout kmeans -i
> http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> -o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -dm
> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow
> -cl -k 25
>
> and
>
> ./mahout kmeans -i
> http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> -o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -dm
> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow
> -cl
>
> both give the same exception still. Kindly suggest.
>
>
>   On Tuesday, March 10, 2015 11:35 AM, Suneel Marthi <
> suneel.marthi@gmail.com> wrote:
>
>
> Oops! I meant to say that -c is required for the random centroid
> initialization if -k is specified.
> It initializes k random centroids in the folder specified by -c. so yes -c
> is required.
>
> On Tue, Mar 10, 2015 at 1:42 AM, Raghuveer <al...@yahoo.com.invalid>
> wrote:
>
> No i have removed the -c option now so i get the mentioned exception that
> -c is mandatory.
>
>
>      On Tuesday, March 10, 2015 11:06 AM, Suneel Marthi <
> suneel.marthi@gmail.com> wrote:
>
>
>  R u still specifying the -c option, its only needed if u have initial
> centroids to launch the KMEans from otherwise KMeans picks random
> centroids.
>
> Also CosineDistanceMeasure doesn't make sense with kMeans which is in
> Euclidean space -try using SquaredEuclidean or Euclidean distances.
>
> On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <al...@yahoo.com.invalid>
> wrote:
>
> > Hi All,
> > I am trying to run the command:
> > ./mahout kmeans -i
> > hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> > -o
> >
> hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
> > -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
> > org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k
> 25
> > -xm mapreduce
> > Since i dont have any clusters yet to give it as an input i can remove it
> > is what forums suggested. But now i get the error
> >
> > Running on hadoop, using /usr/local/hadoop/bin/hadoop and
> HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> > 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option
> > --clusters
> > Missing required option
> > --clusters
> >
> > Usage:
> >  [--input <input> --output <output> --distanceMeasure
> > <distanceMeasure>
> > --clusters <clusters> --numClusters <k> --randomSeed
> > <randomSeed1>
> > [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter
> > <maxIter>
> > --overwrite --clustering --method <method>
> > --outlierThreshold
> > <outlierThreshold> --help --tempDir <tempDir> --startPhase
> > <startPhase>
> > --endPhase
> > <endPhase>]
> > --clusters (-c) clusters    The input centroids, as Vectors.  Must be
> > a
> >                            SequenceFile of Writable, Cluster/Canopy.  If
> > k is
> >                            also specified, then a random set of vectors
> > will
> >                            be selected and written out to this path
> > first
> > 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes:
> > 0.006166666666666667)
> > Kindly help me out.
> > Thanks
> >
> >
> >
>
>
>
>
>
>
>
>

Re: mahout failing with -c as required option

Posted by Raghuveer <al...@yahoo.com.INVALID>.
ok so if -c is required then how can i give it or atleast is there a way to remove -k itself?
./mahout kmeans -i http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000 -o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow -cl -k 25
and 

./mahout kmeans -i http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-00000 -o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow -cl
both give the same exception still. Kindly suggest.
 

     On Tuesday, March 10, 2015 11:35 AM, Suneel Marthi <su...@gmail.com> wrote:
   

 Oops! I meant to say that -c is required for the random centroid initialization if -k is specified.
It initializes k random centroids in the folder specified by -c. so yes -c is required.

On Tue, Mar 10, 2015 at 1:42 AM, Raghuveer <al...@yahoo.com.invalid> wrote:

No i have removed the -c option now so i get the mentioned exception that -c is mandatory.


     On Tuesday, March 10, 2015 11:06 AM, Suneel Marthi <su...@gmail.com> wrote:


 R u still specifying the -c option, its only needed if u have initial
centroids to launch the KMEans from otherwise KMeans picks random centroids.

Also CosineDistanceMeasure doesn't make sense with kMeans which is in
Euclidean space -try using SquaredEuclidean or Euclidean distances.

On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <al...@yahoo.com.invalid>
wrote:

> Hi All,
> I am trying to run the command:
> ./mahout kmeans -i
> hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> -o
> hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
> -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k 25
> -xm mapreduce
> Since i dont have any clusters yet to give it as an input i can remove it
> is what forums suggested. But now i get the error
>
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option
> --clusters
> Missing required option
> --clusters
>
> Usage:
>  [--input <input> --output <output> --distanceMeasure
> <distanceMeasure>
> --clusters <clusters> --numClusters <k> --randomSeed
> <randomSeed1>
> [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter
> <maxIter>
> --overwrite --clustering --method <method>
> --outlierThreshold
> <outlierThreshold> --help --tempDir <tempDir> --startPhase
> <startPhase>
> --endPhase
> <endPhase>]
> --clusters (-c) clusters    The input centroids, as Vectors.  Must be
> a
>                            SequenceFile of Writable, Cluster/Canopy.  If
> k is
>                            also specified, then a random set of vectors
> will
>                            be selected and written out to this path
> first
> 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes:
> 0.006166666666666667)
> Kindly help me out.
> Thanks
>
>
>


   



   

Re: mahout failing with -c as required option

Posted by Suneel Marthi <su...@gmail.com>.
Oops! I meant to say that -c is required for the random centroid
initialization if -k is specified.
It initializes k random centroids in the folder specified by -c. so yes -c
is required.

On Tue, Mar 10, 2015 at 1:42 AM, Raghuveer <al...@yahoo.com.invalid>
wrote:

> No i have removed the -c option now so i get the mentioned exception that
> -c is mandatory.
>
>
>      On Tuesday, March 10, 2015 11:06 AM, Suneel Marthi <
> suneel.marthi@gmail.com> wrote:
>
>
>  R u still specifying the -c option, its only needed if u have initial
> centroids to launch the KMEans from otherwise KMeans picks random
> centroids.
>
> Also CosineDistanceMeasure doesn't make sense with kMeans which is in
> Euclidean space -try using SquaredEuclidean or Euclidean distances.
>
> On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <al...@yahoo.com.invalid>
> wrote:
>
> > Hi All,
> > I am trying to run the command:
> > ./mahout kmeans -i
> > hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> > -o
> >
> hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
> > -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
> > org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k
> 25
> > -xm mapreduce
> > Since i dont have any clusters yet to give it as an input i can remove it
> > is what forums suggested. But now i get the error
> >
> > Running on hadoop, using /usr/local/hadoop/bin/hadoop and
> HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> > 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option
> > --clusters
> > Missing required option
> > --clusters
> >
> > Usage:
> >  [--input <input> --output <output> --distanceMeasure
> > <distanceMeasure>
> > --clusters <clusters> --numClusters <k> --randomSeed
> > <randomSeed1>
> > [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter
> > <maxIter>
> > --overwrite --clustering --method <method>
> > --outlierThreshold
> > <outlierThreshold> --help --tempDir <tempDir> --startPhase
> > <startPhase>
> > --endPhase
> > <endPhase>]
> > --clusters (-c) clusters    The input centroids, as Vectors.  Must be
> > a
> >                            SequenceFile of Writable, Cluster/Canopy.  If
> > k is
> >                            also specified, then a random set of vectors
> > will
> >                            be selected and written out to this path
> > first
> > 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes:
> > 0.006166666666666667)
> > Kindly help me out.
> > Thanks
> >
> >
> >
>
>
>

Re: mahout failing with -c as required option

Posted by Raghuveer <al...@yahoo.com.INVALID>.
No i have removed the -c option now so i get the mentioned exception that -c is mandatory.
 

     On Tuesday, March 10, 2015 11:06 AM, Suneel Marthi <su...@gmail.com> wrote:
   

 R u still specifying the -c option, its only needed if u have initial
centroids to launch the KMEans from otherwise KMeans picks random centroids.

Also CosineDistanceMeasure doesn't make sense with kMeans which is in
Euclidean space -try using SquaredEuclidean or Euclidean distances.

On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <al...@yahoo.com.invalid>
wrote:

> Hi All,
> I am trying to run the command:
> ./mahout kmeans -i
> hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> -o
> hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
> -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k 25
> -xm mapreduce
> Since i dont have any clusters yet to give it as an input i can remove it
> is what forums suggested. But now i get the error
>
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option
> --clusters
> Missing required option
> --clusters
>
> Usage:
>  [--input <input> --output <output> --distanceMeasure
> <distanceMeasure>
> --clusters <clusters> --numClusters <k> --randomSeed
> <randomSeed1>
> [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter
> <maxIter>
> --overwrite --clustering --method <method>
> --outlierThreshold
> <outlierThreshold> --help --tempDir <tempDir> --startPhase
> <startPhase>
> --endPhase
> <endPhase>]
> --clusters (-c) clusters    The input centroids, as Vectors.  Must be
> a
>                            SequenceFile of Writable, Cluster/Canopy.  If
> k is
>                            also specified, then a random set of vectors
> will
>                            be selected and written out to this path
> first
> 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes:
> 0.006166666666666667)
> Kindly help me out.
> Thanks
>
>
>


   

Re: mahout failing with -c as required option

Posted by Raghuveer <al...@yahoo.com.INVALID>.
No i have removed the -c option now so i get the mentioned exception that -c is mandatory.
 

     On Tuesday, March 10, 2015 11:06 AM, Suneel Marthi <su...@gmail.com> wrote:
   

 R u still specifying the -c option, its only needed if u have initial
centroids to launch the KMEans from otherwise KMeans picks random centroids.

Also CosineDistanceMeasure doesn't make sense with kMeans which is in
Euclidean space -try using SquaredEuclidean or Euclidean distances.

On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <al...@yahoo.com.invalid>
wrote:

> Hi All,
> I am trying to run the command:
> ./mahout kmeans -i
> hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> -o
> hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
> -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k 25
> -xm mapreduce
> Since i dont have any clusters yet to give it as an input i can remove it
> is what forums suggested. But now i get the error
>
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option
> --clusters
> Missing required option
> --clusters
>
> Usage:
>  [--input <input> --output <output> --distanceMeasure
> <distanceMeasure>
> --clusters <clusters> --numClusters <k> --randomSeed
> <randomSeed1>
> [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter
> <maxIter>
> --overwrite --clustering --method <method>
> --outlierThreshold
> <outlierThreshold> --help --tempDir <tempDir> --startPhase
> <startPhase>
> --endPhase
> <endPhase>]
> --clusters (-c) clusters    The input centroids, as Vectors.  Must be
> a
>                            SequenceFile of Writable, Cluster/Canopy.  If
> k is
>                            also specified, then a random set of vectors
> will
>                            be selected and written out to this path
> first
> 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes:
> 0.006166666666666667)
> Kindly help me out.
> Thanks
>
>
>


   

Re: mahout failing with -c as required option

Posted by Suneel Marthi <su...@gmail.com>.
R u still specifying the -c option, its only needed if u have initial
centroids to launch the KMEans from otherwise KMeans picks random centroids.

Also CosineDistanceMeasure doesn't make sense with kMeans which is in
Euclidean space -try using SquaredEuclidean or Euclidean distances.

On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <al...@yahoo.com.invalid>
wrote:

> Hi All,
> I am trying to run the command:
> ./mahout kmeans -i
> hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> -o
> hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
> -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k 25
> -xm mapreduce
> Since i dont have any clusters yet to give it as an input i can remove it
> is what forums suggested. But now i get the error
>
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option
> --clusters
> Missing required option
> --clusters
>
> Usage:
>  [--input <input> --output <output> --distanceMeasure
> <distanceMeasure>
> --clusters <clusters> --numClusters <k> --randomSeed
> <randomSeed1>
> [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter
> <maxIter>
> --overwrite --clustering --method <method>
> --outlierThreshold
> <outlierThreshold> --help --tempDir <tempDir> --startPhase
> <startPhase>
> --endPhase
> <endPhase>]
> --clusters (-c) clusters    The input centroids, as Vectors.  Must be
> a
>                             SequenceFile of Writable, Cluster/Canopy.  If
> k is
>                             also specified, then a random set of vectors
> will
>                             be selected and written out to this path
> first
> 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes:
> 0.006166666666666667)
> Kindly help me out.
> Thanks
>
>
>

Re: mahout failing with -c as required option

Posted by Suneel Marthi <su...@gmail.com>.
R u still specifying the -c option, its only needed if u have initial
centroids to launch the KMEans from otherwise KMeans picks random centroids.

Also CosineDistanceMeasure doesn't make sense with kMeans which is in
Euclidean space -try using SquaredEuclidean or Euclidean distances.

On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <al...@yahoo.com.invalid>
wrote:

> Hi All,
> I am trying to run the command:
> ./mahout kmeans -i
> hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> -o
> hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
> -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k 25
> -xm mapreduce
> Since i dont have any clusters yet to give it as an input i can remove it
> is what forums suggested. But now i get the error
>
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option
> --clusters
> Missing required option
> --clusters
>
> Usage:
>  [--input <input> --output <output> --distanceMeasure
> <distanceMeasure>
> --clusters <clusters> --numClusters <k> --randomSeed
> <randomSeed1>
> [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter
> <maxIter>
> --overwrite --clustering --method <method>
> --outlierThreshold
> <outlierThreshold> --help --tempDir <tempDir> --startPhase
> <startPhase>
> --endPhase
> <endPhase>]
> --clusters (-c) clusters    The input centroids, as Vectors.  Must be
> a
>                             SequenceFile of Writable, Cluster/Canopy.  If
> k is
>                             also specified, then a random set of vectors
> will
>                             be selected and written out to this path
> first
> 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes:
> 0.006166666666666667)
> Kindly help me out.
> Thanks
>
>
>