You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Delroy Cameron <de...@gmail.com> on 2010/05/21 22:36:56 UTC

k-means invocation exception still not resolved

so i tried keep getting an invocation exception when attempting to do k-means
clustering and it is really crucial that i get around this problem...

i am checking out the code from the alternate repo
svn co http://svn.apache.org/repos/asf/lucene/mahout/trunk

here is my command
hadoop jar /mahout/core/target/mahout-core-0.4-SNAPSHOT.job
org.apache.mahout.clustering.kmeans.KMeansDriver -i
trecdata-vectors/vectors/part-00000 -o trecdata-kmeans-clusters -c clusters
-dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 20 -cd 0.5 -k
26 -ow -r 8 -cl

here is my output
java.lang.RuntimeException: Error in configuring object
	at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
	at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
	... 5 more
Caused by: java.lang.RuntimeException: Error in configuring object
	at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
	at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
	... 10 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
	... 13 more
Caused by: java.lang.IllegalStateException: Cluster is empty!
	at
org.apache.mahout.clustering.kmeans.KMeansMapper.configure(KMeansMapper.java:74)
	... 18 more


-----
--cheers
Delroy
-- 
View this message in context: http://lucene.472066.n3.nabble.com/k-means-invocation-exception-still-not-resolved-tp835261p835261.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: k-means invocation exception still not resolved

Posted by Delroy Cameron <de...@gmail.com>.
yeah Jeff, 
the synthetic dataset gave the same error on the third iteration....the
first two were ok.
i ran it twice just to be sure

$ hadoop dfs -put /data/synthetic_control.data testdata

$hadoop jar examples/target/mahout-examples-0.4-SNAPSHOT.job \
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job


-----
--cheers
Delroy
-- 
View this message in context: http://lucene.472066.n3.nabble.com/k-means-invocation-exception-still-not-resolved-tp835261p835658.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: k-means invocation exception still not resolved

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Sweet^2! The TFIDF converter puts *something* into <output>/vectors/ but 
it ain't the right format for input to clustering. I think that's 
confusing too, BTW.
Jeff

On 5/21/10 5:15 PM, Delroy Cameron wrote:
> Jeff thanks a bunch...a simple change and i have the output...
> the input vectors for the TFIDF option are
> in /trecdata-vectors/tfidf/vectors
> not in trecdata-vectors/vectors/part-00000
>
> -----
> --cheers
> Delroy
>    


Re: k-means invocation exception still not resolved

Posted by Delroy Cameron <de...@gmail.com>.
Jeff thanks a bunch...a simple change and i have the output...
the input vectors for the TFIDF option are 
in /trecdata-vectors/tfidf/vectors 
not in trecdata-vectors/vectors/part-00000

-----
--cheers
Delroy
-- 
View this message in context: http://lucene.472066.n3.nabble.com/k-means-invocation-exception-still-not-resolved-tp835261p835679.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: k-means invocation exception still not resolved

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Ok, your earlier post was about synthetic control and this clearly isn't 
that. When you run seq2sparse with the TFIDF option, the output vectors 
are actually put into <output>/tfidf/vectors/, not <output> or even 
<output>/vectors/. I suggest you look at examples/bin/build-reuters.sh. 
When you do, you will see that the output file spec of seq2sparse was:

  -o ./examples/bin/work/reuters-out-seqdir-sparse

... and notice that the input file spec of kmeans follows the above pattern:

-i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf/vectors/



On 5/21/10 4:38 PM, Delroy Cameron wrote:
> yeah sorry Jeff,
> i neglected to say that i am trying to clusters a set of 1400 text documents
> from a directory and i'm not using the synthetic dataset. here are the
> commands i used to create the vectors
> the input data i.e. data/trecdata is a directory of raw text files
>
> i'll run the clustering on the synthetic dataset to see if there is
> something wrong with the input vectors.
>
> ./mahout seqdirectory
> -i /data/trecdata
> -o /data/trecdata-seqfiles
> -c ascii
> -chunk 64
> -prefix TREC
>
> and then to create the sparse matrix
> ./mahout seq2sparse
> -s 2
> -a org.apache.lucene.analysis.standard.StandardAnalyzer
> -chunk 100
> -i /home/w007dhc/data/trecdata-seqfiles/chunk-0
> -o /home/w007dhc/data/trecdata-vectors
> -md 1 -x 75 -wt TFIDF -n 0 -w
>
>
>
> -----
> --cheers
> Delroy
>    


Re: k-means invocation exception still not resolved

Posted by Delroy Cameron <de...@gmail.com>.
yeah sorry Jeff, 
i neglected to say that i am trying to clusters a set of 1400 text documents
from a directory and i'm not using the synthetic dataset. here are the
commands i used to create the vectors
the input data i.e. data/trecdata is a directory of raw text files

i'll run the clustering on the synthetic dataset to see if there is
something wrong with the input vectors.

./mahout seqdirectory 
-i /data/trecdata 
-o /data/trecdata-seqfiles 
-c ascii 
-chunk 64 
-prefix TREC

and then to create the sparse matrix
./mahout seq2sparse 
-s 2 
-a org.apache.lucene.analysis.standard.StandardAnalyzer 
-chunk 100 
-i /home/w007dhc/data/trecdata-seqfiles/chunk-0 
-o /home/w007dhc/data/trecdata-vectors 
-md 1 -x 75 -wt TFIDF -n 0 -w



-----
--cheers
Delroy
-- 
View this message in context: http://lucene.472066.n3.nabble.com/k-means-invocation-exception-still-not-resolved-tp835261p835632.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: k-means invocation exception still not resolved

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
What's the format of your input vector sequence file? It should be 
<key=Writable; value=VectorWritable> and the key is ignored. From the 
exception it looks like your input data might not be right. I'm pretty 
sure you aren't running synthetic control, since it just ran for me on 
an EC2 cluster:

$HADOOP_HOME/bin/hadoop jar 
$MAHOUT_HOME/examples/target/mahout-examples-$MAHOUT_VERSION.job 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

On 5/21/10 4:08 PM, Delroy Cameron wrote:
> hey Jeff,
>
> 1) i'm not sure i discern the changes in your command below. in any case i
> copied and pasted it directly and ran it and it also gave the same exception
> as previously
>
> 2) i listed the contents on hadoop resulting from the clustering. here is my
> output. i interrupted the clustering after the first iteration because the
> exception occurs upon  each iteration..i'm sure there is a way to look at
> the vectors to verify that it is not the source of the problem
>
> $ hadoop dfs -ls /user/delroy/
> Found 3 items
> drwxr-xr-x   - delroy delroy          0 2010-05-21 10:04
> /user/delroy/clusters
> drwxr-xr-x   - delroy delroy          0 2010-05-08 04:39
> /user/delroy/trecdata-kmeans-vectors
> drwxr-xr-x   - delroy delroy          0 2010-05-21 07:38
> /user/delroy/trecdata-vectors
>
> $ hadoop dfs -ls /user/delroy/trecdata-kmeans-vectors
> Found 5 items
> -rw-r--r--   2 delroy delroy    1522195 2010-05-08 04:39
> /user/delroy/trecdata-kmeans-vectors/dictionary.file-0
> drwxr-xr-x   - delroy delroy          0 2010-05-08 04:39
> /user/delroy/trecdata-kmeans-vectors/tfidf
> drwxr-xr-x   - delroy delroy          0 2010-05-08 04:39
> /user/delroy/trecdata-kmeans-vectors/tokenized-documents
> drwxr-xr-x   - delroy delroy          0 2010-05-08 04:39
> /user/delroy/trecdata-kmeans-vectors/vectors
> drwxr-xr-x   - delroy delroy          0 2010-05-08 04:39
> /user/delroy/trecdata-kmeans-vectors/wordcount
>
> also i ran the command by specifying only the directory containing the
> vectors i.e.
>
> $ hadoop jar mahout/core/target/mahout-core-0.4-SNAPSHOT.job
> org.apache.mahout.clustering.kmeans.KMeansDriver \
> -i trecdata-vectors \
> -c clusters \
> -o trecdata-kmeans-clusters \
> -dm org.apache.mahout.common.distance.CosineDistanceMeasure
> -x 20 -cd 0.5 -k 26 -ow -r 8 -cl
>
> and i got the following exception below.
>
> 10/05/21 19:02:41 INFO common.HadoopUtil: Deleting clusters
> 10/05/21 19:02:41 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 10/05/21 19:02:41 INFO zlib.ZlibFactory: Successfully loaded&  initialized
> native-zlib library
> 10/05/21 19:02:41 INFO compress.CodecPool: Got brand-new compressor
> Exception in thread "main" java.lang.ClassCastException:
> org.apache.hadoop.io.IntWritable cannot be cast to
> org.apache.mahout.math.VectorWritable
>          at
> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:84)
>          at
> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:99)
>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>          at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>          at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
>
> -----
> --cheers
> Delroy
>    


Re: k-means invocation exception still not resolved

Posted by Delroy Cameron <de...@gmail.com>.
hey Jeff, 

1) i'm not sure i discern the changes in your command below. in any case i
copied and pasted it directly and ran it and it also gave the same exception
as previously

2) i listed the contents on hadoop resulting from the clustering. here is my
output. i interrupted the clustering after the first iteration because the
exception occurs upon  each iteration..i'm sure there is a way to look at
the vectors to verify that it is not the source of the problem

$ hadoop dfs -ls /user/delroy/
Found 3 items
drwxr-xr-x   - delroy delroy          0 2010-05-21 10:04
/user/delroy/clusters
drwxr-xr-x   - delroy delroy          0 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors
drwxr-xr-x   - delroy delroy          0 2010-05-21 07:38
/user/delroy/trecdata-vectors

$ hadoop dfs -ls /user/delroy/trecdata-kmeans-vectors
Found 5 items
-rw-r--r--   2 delroy delroy    1522195 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors/dictionary.file-0
drwxr-xr-x   - delroy delroy          0 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors/tfidf
drwxr-xr-x   - delroy delroy          0 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors/tokenized-documents
drwxr-xr-x   - delroy delroy          0 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors/vectors
drwxr-xr-x   - delroy delroy          0 2010-05-08 04:39
/user/delroy/trecdata-kmeans-vectors/wordcount

also i ran the command by specifying only the directory containing the
vectors i.e. 

$ hadoop jar mahout/core/target/mahout-core-0.4-SNAPSHOT.job
org.apache.mahout.clustering.kmeans.KMeansDriver \
-i trecdata-vectors \
-c clusters \
-o trecdata-kmeans-clusters \
-dm org.apache.mahout.common.distance.CosineDistanceMeasure 
-x 20 -cd 0.5 -k 26 -ow -r 8 -cl

and i got the following exception below.

10/05/21 19:02:41 INFO common.HadoopUtil: Deleting clusters
10/05/21 19:02:41 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
10/05/21 19:02:41 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
10/05/21 19:02:41 INFO compress.CodecPool: Got brand-new compressor
Exception in thread "main" java.lang.ClassCastException:
org.apache.hadoop.io.IntWritable cannot be cast to
org.apache.mahout.math.VectorWritable
        at
org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:84)
        at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:99)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)



-----
--cheers
Delroy
-- 
View this message in context: http://lucene.472066.n3.nabble.com/k-means-invocation-exception-still-not-resolved-tp835261p835572.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: k-means invocation exception still not resolved

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Hmn, no, both forms work for me running k-means on Reuters. Run this and 
tell me what you see.

./bin/mahout clusterdump -s clusters -b 100 -n 20

On 5/21/10 3:39 PM, Jeff Eastman wrote:
> Hi Delroy,
>
> Looking at your command line, the -i argument needs to be the 
> directory containing all the input data, not the input file itself. It 
> could be that the code that scans the input data to sample -k clusters 
> is failing too quietly. What is in clusters when you are done?
>
> Try running:
>
> hadoop jar /mahout/core/target/mahout-core-0.4-SNAPSHOT.job 
> org.apache.mahout.clustering.kmeans.KMeansDriver \
> -i trecdata-vectors/vectors \
> -c clusters \
> -o trecdata-kmeans-clusters \
> -dm org.apache.mahout.common.distance.CosineDistanceMeasure \
> -x 20 -cd 0.5 -k 26 -ow -r 8 -cl
>
>
>
> On 5/21/10 1:36 PM, Delroy Cameron wrote:
>> so i tried keep getting an invocation exception when attempting to do 
>> k-means
>> clustering and it is really crucial that i get around this problem...
>>
>> i am checking out the code from the alternate repo
>> svn co http://svn.apache.org/repos/asf/lucene/mahout/trunk
>>
>> here is my command
>> hadoop jar /mahout/core/target/mahout-core-0.4-SNAPSHOT.job
>> org.apache.mahout.clustering.kmeans.KMeansDriver -i
>> trecdata-vectors/vectors/part-00000 -o trecdata-kmeans-clusters -c 
>> clusters
>> -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 20 -cd 
>> 0.5 -k
>> 26 -ow -r 8 -cl
>>
>> here is my output
>> java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) 
>>
>>     at 
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
>>
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) 
>>
>>     ... 5 more
>> Caused by: java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) 
>>
>>     at 
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
>>
>>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>     ... 10 more
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>
>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) 
>>
>>     ... 13 more
>> Caused by: java.lang.IllegalStateException: Cluster is empty!
>>     at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.configure(KMeansMapper.java:74) 
>>
>>     ... 18 more
>>
>>
>> -----
>> --cheers
>> Delroy
>
>


Re: k-means invocation exception still not resolved

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Hi Delroy,

Looking at your command line, the -i argument needs to be the directory 
containing all the input data, not the input file itself. It could be 
that the code that scans the input data to sample -k clusters is failing 
too quietly. What is in clusters when you are done?

Try running:

hadoop jar /mahout/core/target/mahout-core-0.4-SNAPSHOT.job org.apache.mahout.clustering.kmeans.KMeansDriver \
-i trecdata-vectors/vectors \
-c clusters \
-o trecdata-kmeans-clusters \
-dm org.apache.mahout.common.distance.CosineDistanceMeasure \
-x 20 -cd 0.5 -k 26 -ow -r 8 -cl



On 5/21/10 1:36 PM, Delroy Cameron wrote:
> so i tried keep getting an invocation exception when attempting to do k-means
> clustering and it is really crucial that i get around this problem...
>
> i am checking out the code from the alternate repo
> svn co http://svn.apache.org/repos/asf/lucene/mahout/trunk
>
> here is my command
> hadoop jar /mahout/core/target/mahout-core-0.4-SNAPSHOT.job
> org.apache.mahout.clustering.kmeans.KMeansDriver -i
> trecdata-vectors/vectors/part-00000 -o trecdata-kmeans-clusters -c clusters
> -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 20 -cd 0.5 -k
> 26 -ow -r 8 -cl
>
> here is my output
> java.lang.RuntimeException: Error in configuring object
> 	at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> 	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> 	at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> 	... 5 more
> Caused by: java.lang.RuntimeException: Error in configuring object
> 	at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> 	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> 	at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> 	at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> 	... 10 more
> Caused by: java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> 	... 13 more
> Caused by: java.lang.IllegalStateException: Cluster is empty!
> 	at
> org.apache.mahout.clustering.kmeans.KMeansMapper.configure(KMeansMapper.java:74)
> 	... 18 more
>
>
> -----
> --cheers
> Delroy
>