You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Robin Anil <ro...@gmail.com> on 2008/11/12 13:11:37 UTC

Too Many open files in BayesFileFormatter

I am getting this error while processing the *industry *dataset
http://www.cs.cmu.edu/~TextLearning/datasets.html

I took the leafnode classes(105) and put them in the top directory(total of
around 10K files). Then ran the following. The file close is not happening
in writeFile() . Should I file another JIRA issue for it or add it under
Mahout-60/92/93 ?



robin:~/lucene/mahout/trunk/core/work$ hadoop jar
../../examples/build/apache-mahout-examples-0.1-dev.job
org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups -p industry -o
industry-collapse -a org.apache.lucene.analysis.standard.StandardAnalyzer -c
UTF-8
java.lang.RuntimeException: java.io.FileNotFoundException:
industry/oil.and.gas.operations.industry/http_^^www.tmrc.com^ (Too many open
files)
        at
org.apache.mahout.classifier.BayesFileFormatter$FileProcessor.accept(BayesFileFormatter.java:174)
        at java.io.File.listFiles(File.java:1134)
        at
org.apache.mahout.classifier.BayesFileFormatter.collapse(BayesFileFormatter.java:75)
        at
org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.main(PrepareTwentyNewsgroups.java:86)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: java.io.FileNotFoundException:
industry/oil.and.gas.operations.industry/http_^^www.tmrc.com^ (Too many open
files)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at
org.apache.mahout.classifier.BayesFileFormatter$FileProcessor.accept(BayesFileFormatter.java:162)
        ... 12 more

Re: Too Many open files in BayesFileFormatter

Posted by Sean Owen <sr...@gmail.com>.
Just submitted, try it from SVN now if you like.

On Wed, Nov 12, 2008 at 12:21 PM, Robin Anil <ro...@gmail.com> wrote:
> Cool. go Ahead

Re: Too Many open files in BayesFileFormatter

Posted by Robin Anil <ro...@gmail.com>.
Cool. go Ahead

Robin
On Wed, Nov 12, 2008 at 5:49 PM, Sean Owen <sr...@gmail.com> wrote:

> Or just fix it -- meaning I can take care of this in a few minutes if
> that's alright?
>
> Yeah in general some more care is needed with resource management and
> closing resources when done.
>
> On Wed, Nov 12, 2008 at 12:11 PM, Robin Anil <ro...@gmail.com> wrote:
> > I am getting this error while processing the *industry *dataset
> > http://www.cs.cmu.edu/~TextLearning/datasets.html<http://www.cs.cmu.edu/%7ETextLearning/datasets.html>
> >
> > I took the leafnode classes(105) and put them in the top directory(total
> of
> > around 10K files). Then ran the following. The file close is not
> happening
> > in writeFile() . Should I file another JIRA issue for it or add it under
> > Mahout-60/92/93 ?
>

Re: Too Many open files in BayesFileFormatter

Posted by Sean Owen <sr...@gmail.com>.
Or just fix it -- meaning I can take care of this in a few minutes if
that's alright?

Yeah in general some more care is needed with resource management and
closing resources when done.

On Wed, Nov 12, 2008 at 12:11 PM, Robin Anil <ro...@gmail.com> wrote:
> I am getting this error while processing the *industry *dataset
> http://www.cs.cmu.edu/~TextLearning/datasets.html
>
> I took the leafnode classes(105) and put them in the top directory(total of
> around 10K files). Then ran the following. The file close is not happening
> in writeFile() . Should I file another JIRA issue for it or add it under
> Mahout-60/92/93 ?