You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Adam Hammer <ad...@gmail.com> on 2010/04/05 15:59:51 UTC

"Not a file" issue with TwentyNewsGroups

Hello all,

I am just starting out with Mahout, and to get my feet wet I am running
through the TwentyNewsGroups example.  I have successfully configured a
single node Hadoop system as well as a pseudo-distributed Hadoop system on
two separate machines.  On both environments, I have gone through the guide
successfully to put all the news inputs into the folder 20news-input.  I am
able to successfully ls and cat the files in the directory.

However, when I go to run the TrainClassifier, I am getting the following
message:

10/04/05 09:48:33 INFO bayes.TrainClassifier: Training Complementary Bayes
Classifier
10/04/05 09:48:33 INFO cbayes.CBayesDriver: Reading features...
10/04/05 09:48:33 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
10/04/05 09:48:33 INFO mapred.FileInputFormat: Total input paths to process
: 19
Exception in thread "main" java.io.IOException: Not a file:
hdfs://localhost:9000/user/bob/20news-input/comp.graphics
    at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
    at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
    at
org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureDriver.runJob(BayesFeatureDriver.java:75)
    at
org.apache.mahout.classifier.bayes.mapreduce.cbayes.CBayesDriver.runJob(CBayesDriver.java:61)
    at
org.apache.mahout.classifier.bayes.TrainClassifier.trainCNaiveBayes(TrainClassifier.java:56)
    at
org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

I get this error on both the single node system I have setup, as well as the
separate dual-node system.  As I said before, I am able to cat and ls that
directory and the files in it perfectly fine.  Any thoughts?

Thanks!

Re: "Not a file" issue with TwentyNewsGroups

Posted by Ted Dunning <te...@gmail.com>.
Have you verified that the file actually exists in HDFS?

On Fri, Apr 9, 2010 at 10:46 AM, adam35413 <ad...@gmail.com> wrote:

>
> I did not do any conversion, correct.  I followed this guide and the
> instructions for running the data on a hadoop cluster:
> http://cwiki.apache.org/confluence/display/MAHOUT/TwentyNewsgroups
>
> I used the following command:
> $HADOOP_HOME/bin/hadoop jar
> $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
> org.apache.mahout.classifier.bayes.TrainClassifier -i 20news-input -o
> newsmodel -ng 3 -type bayes -source hdfs
>
>
> --
> View this message in context:
> http://n3.nabble.com/Not-a-file-issue-with-TwentyNewsGroups-tp698023p708859.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Re: "Not a file" issue with TwentyNewsGroups

Posted by adam35413 <ad...@gmail.com>.
I did not do any conversion, correct.  I followed this guide and the
instructions for running the data on a hadoop cluster:
http://cwiki.apache.org/confluence/display/MAHOUT/TwentyNewsgroups 

I used the following command:
$HADOOP_HOME/bin/hadoop jar
$MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
org.apache.mahout.classifier.bayes.TrainClassifier -i 20news-input -o
newsmodel -ng 3 -type bayes -source hdfs


-- 
View this message in context: http://n3.nabble.com/Not-a-file-issue-with-TwentyNewsGroups-tp698023p708859.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: "Not a file" issue with TwentyNewsGroups

Posted by adam35413 <ad...@gmail.com>.
I found this command:

mvn -e  exec:java  
-Dexec.mainClass=org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups
-Dexec.args="-p 20news-18828 -o 20news-input -a
org.apache.lucene.analysis.standard.StandardAnalyzer -c UTF-8"

Which after several trial and error attempts got to run.  Thanks!
-- 
View this message in context: http://n3.nabble.com/Not-a-file-issue-with-TwentyNewsGroups-tp698023p708922.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: "Not a file" issue with TwentyNewsGroups

Posted by Robin Anil <ro...@gmail.com>.
I am assuming that you *didn't* convert the 20newsgroups into the required
format which resulted in this error. Is my guess right?

Robin

On Wed, Apr 7, 2010 at 3:29 AM, Grant Ingersoll <gs...@apache.org> wrote:

> What are the commands you are running?
>
> On Apr 5, 2010, at 9:59 AM, Adam Hammer wrote:
>
> > Hello all,
> >
> > I am just starting out with Mahout, and to get my feet wet I am running
> > through the TwentyNewsGroups example.  I have successfully configured a
> > single node Hadoop system as well as a pseudo-distributed Hadoop system
> on
> > two separate machines.  On both environments, I have gone through the
> guide
> > successfully to put all the news inputs into the folder 20news-input.  I
> am
> > able to successfully ls and cat the files in the directory.
> >
> > However, when I go to run the TrainClassifier, I am getting the following
> > message:
> >
> > 10/04/05 09:48:33 INFO bayes.TrainClassifier: Training Complementary
> Bayes
> > Classifier
> > 10/04/05 09:48:33 INFO cbayes.CBayesDriver: Reading features...
> > 10/04/05 09:48:33 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> > 10/04/05 09:48:33 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 19
> > Exception in thread "main" java.io.IOException: Not a file:
> > hdfs://localhost:9000/user/bob/20news-input/comp.graphics
> >    at
> >
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206)
> >    at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> >    at
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
> >    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
> >    at
> >
> org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureDriver.runJob(BayesFeatureDriver.java:75)
> >    at
> >
> org.apache.mahout.classifier.bayes.mapreduce.cbayes.CBayesDriver.runJob(CBayesDriver.java:61)
> >    at
> >
> org.apache.mahout.classifier.bayes.TrainClassifier.trainCNaiveBayes(TrainClassifier.java:56)
> >    at
> >
> org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:128)
> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >    at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >    at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> > I get this error on both the single node system I have setup, as well as
> the
> > separate dual-node system.  As I said before, I am able to cat and ls
> that
> > directory and the files in it perfectly fine.  Any thoughts?
> >
> > Thanks!
>
>

Re: "Not a file" issue with TwentyNewsGroups

Posted by Grant Ingersoll <gs...@apache.org>.
What are the commands you are running?

On Apr 5, 2010, at 9:59 AM, Adam Hammer wrote:

> Hello all,
> 
> I am just starting out with Mahout, and to get my feet wet I am running
> through the TwentyNewsGroups example.  I have successfully configured a
> single node Hadoop system as well as a pseudo-distributed Hadoop system on
> two separate machines.  On both environments, I have gone through the guide
> successfully to put all the news inputs into the folder 20news-input.  I am
> able to successfully ls and cat the files in the directory.
> 
> However, when I go to run the TrainClassifier, I am getting the following
> message:
> 
> 10/04/05 09:48:33 INFO bayes.TrainClassifier: Training Complementary Bayes
> Classifier
> 10/04/05 09:48:33 INFO cbayes.CBayesDriver: Reading features...
> 10/04/05 09:48:33 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/04/05 09:48:33 INFO mapred.FileInputFormat: Total input paths to process
> : 19
> Exception in thread "main" java.io.IOException: Not a file:
> hdfs://localhost:9000/user/bob/20news-input/comp.graphics
>    at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206)
>    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>    at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
>    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
>    at
> org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureDriver.runJob(BayesFeatureDriver.java:75)
>    at
> org.apache.mahout.classifier.bayes.mapreduce.cbayes.CBayesDriver.runJob(CBayesDriver.java:61)
>    at
> org.apache.mahout.classifier.bayes.TrainClassifier.trainCNaiveBayes(TrainClassifier.java:56)
>    at
> org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:128)
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> I get this error on both the single node system I have setup, as well as the
> separate dual-node system.  As I said before, I am able to cat and ls that
> directory and the files in it perfectly fine.  Any thoughts?
> 
> Thanks!