You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2008/11/01 03:07:03 UTC
Re: BayesFeatureMapper
I've got a proposed fix. I'll put up a patch tomorrow, assuming
testing works out.
On Oct 31, 2008, at 6:17 PM, Grant Ingersoll wrote:
> See MAHOUT-92.
>
>
> On Oct 31, 2008, at 5:19 PM, Grant Ingersoll wrote:
>
>> Hi Robin,
>>
>> I'm trying to get the Bayes stuff working on the 20 Newsgroups per
>> the instructions on MAHOUT-20. It seems like the
>> BayesFeatureMapper isn't really doing anything. Sean put in a
>> "TODO" comment on line 72, and it pretty much shows that the
>> word_list is not getting anything in.
>>
>> When I got to run this, I get:
>> 08/10/31 17:18:09 INFO bayes.BayesDriver: Calculating Tf-Idf...
>> 08/10/31 17:18:09 INFO common.BayesTfIdfDriver: Counts of documents
>> in Each Label
>> 08/10/31 17:18:09 INFO common.BayesTfIdfDriver:
>> {rec.motorcycles=994.0, comp.windows.x=980.0,
>> talk.politics.guns=910.0, talk.politics.mideast=940.0,
>> talk.religion.misc=628.0, rec.sport.baseball=994.0,
>> rec.autos=990.0, rec.sport.hockey=999.0,
>> comp.sys.mac.hardware=961.0, comp.sys.ibm.pc.hardware=982.0,
>> sci.space=987.0, talk.politics.misc=775.0, sci.electronics=981.0,
>> comp.graphics=973.0, sci.crypt=991.0, sci.med=990.0,
>> soc.religion.christian=997.0, alt.atheism=799.0,
>> misc.forsale=972.0, comp.os.ms-windows.misc=985.0}
>> 08/10/31 17:18:09 INFO jvm.JvmMetrics: Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 08/10/31 17:18:09 WARN mapred.JobClient: Use GenericOptionsParser
>> for parsing the arguments. Applications should implement Tool for
>> the same.
>> 08/10/31 17:18:10 WARN mapred.JobClient: No job jar file set. User
>> classes may not be found. See JobConf(Class) or
>> JobConf#setJar(String).
>> Exception in thread "main"
>> org.apache.hadoop.mapred.InvalidInputException: Input path does not
>> exist: file:/Volumes/User/grantingersoll/projects/lucene/mahout/
>> output/bayes/trainer-termDocCount
>> Input path does not exist: file:/Volumes/User/grantingersoll/
>> projects/lucene/mahout/output/bayes/trainer-wordFreq
>> Input path does not exist: file:/Volumes/User/grantingersoll/
>> projects/lucene/mahout/output/bayes/trainer-featureCount
>> at
>> org
>> .apache
>> .hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
>> at
>> org
>> .apache
>> .hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:210)
>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
>> at
>> org
>> .apache
>> .mahout
>> .classifier
>> .bayes.common.BayesTfIdfDriver.runJob(BayesTfIdfDriver.java:112)
>> at
>> org
>> .apache.mahout.classifier.bayes.BayesDriver.runJob(BayesDriver.java:
>> 76)
>> at
>> org
>> .apache.mahout.classifier.bayes.BayesDriver.main(BayesDriver.java:54)
>>
>> I'm pretty sure we need to add something to the word list from the
>> input value. Right?
>>
>> -Grant
>