You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2008/11/01 03:07:03 UTC

Re: BayesFeatureMapper

I've got a proposed fix.  I'll put up a patch tomorrow, assuming  
testing works out.

On Oct 31, 2008, at 6:17 PM, Grant Ingersoll wrote:

> See MAHOUT-92.
>
>
> On Oct 31, 2008, at 5:19 PM, Grant Ingersoll wrote:
>
>> Hi Robin,
>>
>> I'm trying to get the Bayes stuff working on the 20 Newsgroups per  
>> the instructions on MAHOUT-20.  It seems like the  
>> BayesFeatureMapper isn't really doing anything.  Sean put in a  
>> "TODO" comment on line 72, and it pretty much shows that the  
>> word_list is not getting anything in.
>>
>> When I got to run this, I get:
>> 08/10/31 17:18:09 INFO bayes.BayesDriver: Calculating Tf-Idf...
>> 08/10/31 17:18:09 INFO common.BayesTfIdfDriver: Counts of documents  
>> in Each Label
>> 08/10/31 17:18:09 INFO common.BayesTfIdfDriver:  
>> {rec.motorcycles=994.0, comp.windows.x=980.0,  
>> talk.politics.guns=910.0, talk.politics.mideast=940.0,  
>> talk.religion.misc=628.0, rec.sport.baseball=994.0,  
>> rec.autos=990.0, rec.sport.hockey=999.0,  
>> comp.sys.mac.hardware=961.0, comp.sys.ibm.pc.hardware=982.0,  
>> sci.space=987.0, talk.politics.misc=775.0, sci.electronics=981.0,  
>> comp.graphics=973.0, sci.crypt=991.0, sci.med=990.0,  
>> soc.religion.christian=997.0, alt.atheism=799.0,  
>> misc.forsale=972.0, comp.os.ms-windows.misc=985.0}
>> 08/10/31 17:18:09 INFO jvm.JvmMetrics: Cannot initialize JVM  
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 08/10/31 17:18:09 WARN mapred.JobClient: Use GenericOptionsParser  
>> for parsing the arguments. Applications should implement Tool for  
>> the same.
>> 08/10/31 17:18:10 WARN mapred.JobClient: No job jar file set.  User  
>> classes may not be found. See JobConf(Class) or  
>> JobConf#setJar(String).
>> Exception in thread "main"  
>> org.apache.hadoop.mapred.InvalidInputException: Input path does not  
>> exist: file:/Volumes/User/grantingersoll/projects/lucene/mahout/ 
>> output/bayes/trainer-termDocCount
>> Input path does not exist: file:/Volumes/User/grantingersoll/ 
>> projects/lucene/mahout/output/bayes/trainer-wordFreq
>> Input path does not exist: file:/Volumes/User/grantingersoll/ 
>> projects/lucene/mahout/output/bayes/trainer-featureCount
>> 	at  
>> org 
>> .apache 
>> .hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
>> 	at  
>> org 
>> .apache 
>> .hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:210)
>> 	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
>> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
>> 	at  
>> org 
>> .apache 
>> .mahout 
>> .classifier 
>> .bayes.common.BayesTfIdfDriver.runJob(BayesTfIdfDriver.java:112)
>> 	at  
>> org 
>> .apache.mahout.classifier.bayes.BayesDriver.runJob(BayesDriver.java: 
>> 76)
>> 	at  
>> org 
>> .apache.mahout.classifier.bayes.BayesDriver.main(BayesDriver.java:54)
>>
>> I'm pretty sure we need to add something to the word list from the  
>> input value. Right?
>>
>> -Grant
>