You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Alfred Dimaunahan <al...@fbmsoftware.com> on 2011/03/16 10:04:57 UTC

Bug in fkmeans?

Not sure if anyone encountered this error too.

Procedure:
- update build-reuters.sh and use Fuzzy K-Means instead of K-Means by
adding/modifying these lines:
===
# to use k-Means clustering, uncomment the next three lines
./bin/mahout seq2sparse -i ./examples/bin/work/reuters-out-seqdir/ -o
./examples/bin/work/reuters-out-seqdir-sparse
#./bin/mahout kmeans -i
./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -k
20 -ow
#./bin/mahout clusterdump -s examples/bin/work/reuters-kmeans/clusters-10 -d
examples/bin/work/reuters-out-seqdir-sparse/dictionary.file-0 -dt
sequencefile -b 100 -n 20

./bin/mahout fkmeans -i
./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
./examples/bin/work/clusters -o ./examples/bin/work/reuters-fkmeans -x 10
-ow -m 2 -k 3
./bin/mahout clusterdump -s examples/bin/work/reuters-fkmeans/clusters-10 -d
examples/bin/work/reuters-out-seqdir-sparse/dictionary.file-0 -dt
sequencefile -b 100 -n 20
===

When you run it, it will show:
===

Exception in thread "main" java.lang.NumberFormatException: null
	at java.lang.Integer.parseInt(Integer.java:417)
	at java.lang.Integer.parseInt(Integer.java:499)
	at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.run(FuzzyKMeansDriver.java:112)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.main(FuzzyKMeansDriver.java:63)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)

===

But when -k is not included, everything is fine. Am I doing something wrong?
'3' is indeed and Integer!

I'm using mahout-distribution-0.4/

Thanks.

-Alfred

Re: Bug in fkmeans?

Posted by Alfred Dimaunahan <al...@fbmsoftware.com>.
Hi Kate,

You're right, so it is indeed a bug. Thanks for confirming.

Thanks for the tip also about using K-Means then Fuzzy K-Means.. though I'm
not sure yet on it's purpose, but I'll look into it soon.

-Alfred

On Wed, Mar 16, 2011 at 9:00 PM, Kate Ericson <er...@cs.colostate.edu>wrote:

> Hi Alfred,
>
> I've run into this problem before as well.  If you dig into the source
> code a bit, you'll see that it's trying to parse the input multiple
> times, which is why it's getting the null when trying to randomly
> generate cluster centers.  I was in a rush, so I just generated the
> initial clusters with k-means, then passed them into fuzzy k-means.
> I'm pretty sure that the dev version (you can grab it with svn) has
> the problem fixed.
>
> -Kate
>
> On Wed, Mar 16, 2011 at 3:04 AM, Alfred Dimaunahan
> <al...@fbmsoftware.com> wrote:
> > Not sure if anyone encountered this error too.
> >
> > Procedure:
> > - update build-reuters.sh and use Fuzzy K-Means instead of K-Means by
> > adding/modifying these lines:
> > ===
> > # to use k-Means clustering, uncomment the next three lines
> > ./bin/mahout seq2sparse -i ./examples/bin/work/reuters-out-seqdir/ -o
> > ./examples/bin/work/reuters-out-seqdir-sparse
> > #./bin/mahout kmeans -i
> > ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
> > ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
> -k
> > 20 -ow
> > #./bin/mahout clusterdump -s examples/bin/work/reuters-kmeans/clusters-10
> -d
> > examples/bin/work/reuters-out-seqdir-sparse/dictionary.file-0 -dt
> > sequencefile -b 100 -n 20
> >
> > ./bin/mahout fkmeans -i
> > ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
> > ./examples/bin/work/clusters -o ./examples/bin/work/reuters-fkmeans -x 10
> > -ow -m 2 -k 3
> > ./bin/mahout clusterdump -s examples/bin/work/reuters-fkmeans/clusters-10
> -d
> > examples/bin/work/reuters-out-seqdir-sparse/dictionary.file-0 -dt
> > sequencefile -b 100 -n 20
> > ===
> >
> > When you run it, it will show:
> > ===
> >
> > Exception in thread "main" java.lang.NumberFormatException: null
> >        at java.lang.Integer.parseInt(Integer.java:417)
> >        at java.lang.Integer.parseInt(Integer.java:499)
> >        at
> org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.run(FuzzyKMeansDriver.java:112)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >        at
> org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.main(FuzzyKMeansDriver.java:63)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >        at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >        at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
> >
> > ===
> >
> > But when -k is not included, everything is fine. Am I doing something
> wrong?
> > '3' is indeed and Integer!
> >
> > I'm using mahout-distribution-0.4/
> >
> > Thanks.
> >
> > -Alfred
> >
>

Re: Bug in fkmeans?

Posted by Kate Ericson <er...@cs.colostate.edu>.
Hi Alfred,

I've run into this problem before as well.  If you dig into the source
code a bit, you'll see that it's trying to parse the input multiple
times, which is why it's getting the null when trying to randomly
generate cluster centers.  I was in a rush, so I just generated the
initial clusters with k-means, then passed them into fuzzy k-means.
I'm pretty sure that the dev version (you can grab it with svn) has
the problem fixed.

-Kate

On Wed, Mar 16, 2011 at 3:04 AM, Alfred Dimaunahan
<al...@fbmsoftware.com> wrote:
> Not sure if anyone encountered this error too.
>
> Procedure:
> - update build-reuters.sh and use Fuzzy K-Means instead of K-Means by
> adding/modifying these lines:
> ===
> # to use k-Means clustering, uncomment the next three lines
> ./bin/mahout seq2sparse -i ./examples/bin/work/reuters-out-seqdir/ -o
> ./examples/bin/work/reuters-out-seqdir-sparse
> #./bin/mahout kmeans -i
> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -k
> 20 -ow
> #./bin/mahout clusterdump -s examples/bin/work/reuters-kmeans/clusters-10 -d
> examples/bin/work/reuters-out-seqdir-sparse/dictionary.file-0 -dt
> sequencefile -b 100 -n 20
>
> ./bin/mahout fkmeans -i
> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-fkmeans -x 10
> -ow -m 2 -k 3
> ./bin/mahout clusterdump -s examples/bin/work/reuters-fkmeans/clusters-10 -d
> examples/bin/work/reuters-out-seqdir-sparse/dictionary.file-0 -dt
> sequencefile -b 100 -n 20
> ===
>
> When you run it, it will show:
> ===
>
> Exception in thread "main" java.lang.NumberFormatException: null
>        at java.lang.Integer.parseInt(Integer.java:417)
>        at java.lang.Integer.parseInt(Integer.java:499)
>        at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.run(FuzzyKMeansDriver.java:112)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver.main(FuzzyKMeansDriver.java:63)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
>
> ===
>
> But when -k is not included, everything is fine. Am I doing something wrong?
> '3' is indeed and Integer!
>
> I'm using mahout-distribution-0.4/
>
> Thanks.
>
> -Alfred
>