You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Ken Williams <zo...@hotmail.com> on 2011/04/13 13:48:40 UTC

20NewsGroups Error: Illegal Capacity: -40

Hi All,

I'm having trouble getting the 20News-Groups 
(https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups,
 and https://cwiki.apache.org/MAHOUT/twenty-newsgroups.html)
example to run.

I've downloaded the data and tried to train the Naive Bayes classifier 
but I ran the 'trainclassifier' command and got this error message...

hadoop@kdevlinux:/usr/local/mahout$ mahout trainclassifier -i
examples/bin/work/20news-bydate/bayes-train-input -o
examples/bin/work/20news-bydate/bayes-model -type bayes -ng 1 -source hdfs
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop
No HADOOP_CONF_DIR set, using /usr/local/hadoop/src/conf
11/04/13 09:16:29 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.utils.eval.InMemoryFactorizationEvaluator
11/04/13 09:16:29 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.utils.eval.ParallelFactorizationEvaluator
11/04/13 09:16:29 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.utils.eval.DatasetSplitter
11/04/13 09:16:29 INFO bayes.TrainClassifier: Training Bayes Classifier
11/04/13 09:16:29 INFO bayes.BayesDriver: Reading features...
11/04/13 09:16:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
11/04/13 09:16:31 INFO mapred.FileInputFormat: Total input paths to process : 20
Exception in thread "main" java.lang.IllegalArgumentException: 
Illegal Capacity: -40
at java.util.ArrayList.<init>(ArrayList.java:110)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:216)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at
org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureDriver.runJob
(BayesFeatureDriver.java:63)
at
org.apache.mahout.classifier.bayes.mapreduce.bayes.BayesDriver.runJob
(BayesDriver.java:47)
at
org.apache.mahout.classifier.bayes.TrainClassifier.trainNaiveBayes
(TrainClassifier.java:54)
at org.apache.mahout.classifier.bayes.TrainClassifier.main
(TrainClassifier.java:162)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke
(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


I thought that maybe I had entered a command wrongly, but then I found the
'build-20news-bayes.sh' shell script, and when I try to run this I get the 
same exception.

I've been running Hadoop 0.20.2 on a 4-node cluster smoothly until now, all 
are Debian machines using sun-java6-* packages, and I'm running Mahout 
trunk checked out of the svn repository 
(svn co http://svn.apache.org/repos/asf/mahout/trunk) today.

All the <newsgroup>.txt files seem to have been created and uploaded 
to HDFS correctly ('hadoop dfs -lsr examples/bin/work'). 

I'm not sure what to try next. Any help would be very welcome.

Ken 




Re: 20NewsGroups Error: Illegal Capacity: -40

Posted by Ted Dunning <te...@gmail.com>.
I filed https://issues.apache.org/jira/browse/MAHOUT-669 for this.

Anybody who would like to should please file a patch to fix one or more
scripts.

On Wed, Apr 13, 2011 at 9:34 AM, Ken Williams <zo...@hotmail.com> wrote:

> Ted Dunning <ted.dunning <at> gmail.com> writes:
>
> >
> > This may be a bit of regression.
>
> Thanks for the reply.
>
> Just out of interest, I also reckon your
> 'build-cluster-syntheticcontrol.sh'
> script should be a bash script (#!/bin/bash) rather than a standard
> shell (#!/bin/sh) script.
>
>
> $ trunk/examples/bin/build-cluster-syntheticcontrol.sh
> trunk/examples/bin/build-cluster-syntheticcontrol.sh: 28: Syntax error: "("
> unexpected (expecting "fi")
> $
>
>
> Regards,
>
>     Ken
>
>
> >
> > On Wed, Apr 13, 2011 at 4:48 AM, Ken Williams <zoo9000 <at> hotmail.com>
> wrote:
> >
> > > I'm not sure what to try next. Any help would be very welcome.
> > >
> >
>
>
>
>
>

Re: 20NewsGroups Error: Illegal Capacity: -40

Posted by Ted Dunning <te...@gmail.com>.
Very good idea.

On Wed, Apr 13, 2011 at 9:49 AM, Frank Scholten <sc...@gmail.com>wrote:

> This sh error also occurred for the reuters script but has been fixed.
> Maybe good to update all scripts to bash?
>
> On Apr 13, 2011, at 18:34, Ken Williams <zo...@hotmail.com> wrote:
>
> > Ted Dunning <ted.dunning <at> gmail.com> writes:
> >
> >>
> >> This may be a bit of regression.
> >
> > Thanks for the reply.
> >
> > Just out of interest, I also reckon your
> 'build-cluster-syntheticcontrol.sh'
> > script should be a bash script (#!/bin/bash) rather than a standard
> > shell (#!/bin/sh) script.
> >
> >
> > $ trunk/examples/bin/build-cluster-syntheticcontrol.sh
> > trunk/examples/bin/build-cluster-syntheticcontrol.sh: 28: Syntax error:
> "("
> > unexpected (expecting "fi")
> > $
> >
> >
> > Regards,
> >
> >     Ken
> >
> >
> >>
> >> On Wed, Apr 13, 2011 at 4:48 AM, Ken Williams <zoo9000 <at> hotmail.com>
> wrote:
> >>
> >>> I'm not sure what to try next. Any help would be very welcome.
> >>>
> >>
> >
> >
> >
> >
>

Re: 20NewsGroups Error: Illegal Capacity: -40

Posted by Frank Scholten <sc...@gmail.com>.
This sh error also occurred for the reuters script but has been fixed. Maybe good to update all scripts to bash?

On Apr 13, 2011, at 18:34, Ken Williams <zo...@hotmail.com> wrote:

> Ted Dunning <ted.dunning <at> gmail.com> writes:
> 
>> 
>> This may be a bit of regression.
> 
> Thanks for the reply.
> 
> Just out of interest, I also reckon your 'build-cluster-syntheticcontrol.sh' 
> script should be a bash script (#!/bin/bash) rather than a standard
> shell (#!/bin/sh) script.
> 
> 
> $ trunk/examples/bin/build-cluster-syntheticcontrol.sh 
> trunk/examples/bin/build-cluster-syntheticcontrol.sh: 28: Syntax error: "("
> unexpected (expecting "fi")
> $ 
> 
> 
> Regards,
> 
>     Ken
> 
> 
>> 
>> On Wed, Apr 13, 2011 at 4:48 AM, Ken Williams <zoo9000 <at> hotmail.com> wrote:
>> 
>>> I'm not sure what to try next. Any help would be very welcome.
>>> 
>> 
> 
> 
> 
> 

Re: 20NewsGroups Error: Illegal Capacity: -40

Posted by Ken Williams <zo...@hotmail.com>.
Ted Dunning <ted.dunning <at> gmail.com> writes:

> 
> This may be a bit of regression.

Thanks for the reply.

Just out of interest, I also reckon your 'build-cluster-syntheticcontrol.sh' 
script should be a bash script (#!/bin/bash) rather than a standard
shell (#!/bin/sh) script.


$ trunk/examples/bin/build-cluster-syntheticcontrol.sh 
trunk/examples/bin/build-cluster-syntheticcontrol.sh: 28: Syntax error: "("
unexpected (expecting "fi")
$ 


Regards,

     Ken


> 
> On Wed, Apr 13, 2011 at 4:48 AM, Ken Williams <zoo9000 <at> hotmail.com> wrote:
> 
> > I'm not sure what to try next. Any help would be very welcome.
> >
> 





Re: 20NewsGroups Error: Illegal Capacity: -40

Posted by Ted Dunning <te...@gmail.com>.
This may be a bit of regression.

On Wed, Apr 13, 2011 at 4:48 AM, Ken Williams <zo...@hotmail.com> wrote:

> I'm not sure what to try next. Any help would be very welcome.
>