You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "rm2903@columbia.edu" <rm...@columbia.edu> on 2011/05/10 12:44:55 UTC

Using .mvc file to train a classifier

Hi,
I am a MAHOUT Beginner.
I used weka to generate a .arff file from the training data.
 ./bin/mahout  arff.vector --input ../X4.classifier.arff --output
raghavan_test_output/ --dictOut label_bindings

I found the label_bindings file to be empty. I could nt understand the
reason for it and also wondering what the content of the label_bindings
files should be..

I had a .mvc file generated in the directory i mentioned. 

Can I use this file to train a classifier using the command shown below ?
./bin/mahout trainclassifier --input
raghavan_test_output/X4.classifier.arff.mvc --output X4model -type cbayes

I am pasting the logs below which i got while executing the command above

INFO: Bayes Feature Mapper: Document Label:
????????5S???s??8K????[???M?q?^J??Z??<???G??Ij??
                                                                                           
k?j?O??
u_????(q?6j??h?O?#O???k^?????^?5???F}A>8???Y??9???c???????i???q?,??*?G?E???W?-?9??
May 10, 2011 6:43:15 AM org.apache.hadoop.mapred.JobClient
monitorAndPrintJob
INFO:  map 100% reduce 0%
May 10, 2011 6:43:17 AM org.apache.hadoop.mapred.LocalJobRunner$Job
statusUpdate
INFO: Bayes Feature Mapper: Document Label:
????????5S???s??8K????[???M?q?^J??Z??<???G??Ij??
                                                                                           
k?j?O??
u_????(q?6j??h?O?#O???k^?????^?5???F}A>8???Y??9???c???????i???q?,??*?G?E???W?-?9??
May 10, 2011 6:43:20 AM org.apache.hadoop.mapred.LocalJobRunner$Job
statusUpdate
INFO: Bayes Feature Mapper: Document Label:
????????5S???s??8K????[???M?q?^J??Z??<???G??Ij??
                                                                                           
k?j?O??
u_????(q?6j??h?O?#O???k^?????^?5???F}A>8???Y??9???c???????i???q?,??*?G?E???W?-?9??

thanks,
Raghavan

--
View this message in context: http://lucene.472066.n3.nabble.com/Using-mvc-file-to-train-a-classifier-tp2922588p2922588.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Using .mvc file to train a classifier

Posted by Daniel McEnnis <dm...@gmail.com>.
Dear,

To the best of my knowledge, Naive Bayes Classifier does not support
data in the weka format.  It must be fed tokenized text or wikipedia
XML.

https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example

Hope this helps,

Daniel.

On Tue, May 10, 2011 at 6:44 AM, rm2903@columbia.edu
<rm...@columbia.edu> wrote:
> Hi,
> I am a MAHOUT Beginner.
> I used weka to generate a .arff file from the training data.
>  ./bin/mahout  arff.vector --input ../X4.classifier.arff --output
> raghavan_test_output/ --dictOut label_bindings
>
> I found the label_bindings file to be empty. I could nt understand the
> reason for it and also wondering what the content of the label_bindings
> files should be..
>
> I had a .mvc file generated in the directory i mentioned.
>
> Can I use this file to train a classifier using the command shown below ?
> ./bin/mahout trainclassifier --input
> raghavan_test_output/X4.classifier.arff.mvc --output X4model -type cbayes
>
> I am pasting the logs below which i got while executing the command above
>
> INFO: Bayes Feature Mapper: Document Label:
> ????????5S???s??8K????[???M?q?^J??Z??<???G??Ij??
>
> k?j?O??
> u_????(q?6j??h?O?#O???k^?????^?5???F}A>8???Y??9???c???????i???q?,??*?G?E???W?-?9??
> May 10, 2011 6:43:15 AM org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
> INFO:  map 100% reduce 0%
> May 10, 2011 6:43:17 AM org.apache.hadoop.mapred.LocalJobRunner$Job
> statusUpdate
> INFO: Bayes Feature Mapper: Document Label:
> ????????5S???s??8K????[???M?q?^J??Z??<???G??Ij??
>
> k?j?O??
> u_????(q?6j??h?O?#O???k^?????^?5???F}A>8???Y??9???c???????i???q?,??*?G?E???W?-?9??
> May 10, 2011 6:43:20 AM org.apache.hadoop.mapred.LocalJobRunner$Job
> statusUpdate
> INFO: Bayes Feature Mapper: Document Label:
> ????????5S???s??8K????[???M?q?^J??Z??<???G??Ij??
>
> k?j?O??
> u_????(q?6j??h?O?#O???k^?????^?5???F}A>8???Y??9???c???????i???q?,??*?G?E???W?-?9??
>
> thanks,
> Raghavan
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Using-mvc-file-to-train-a-classifier-tp2922588p2922588.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>

Re: Using .mvc file to train a classifier

Posted by Ted Dunning <te...@gmail.com>.
Can you say a bit more about what you are trying to do at a higher level?

Also, the bayes classifier is very picky about its input format.

On Tue, May 10, 2011 at 3:44 AM, rm2903@columbia.edu <rm...@columbia.edu>wrote:

> Hi,
> I am a MAHOUT Beginner.
> I used weka to generate a .arff file from the training data.
>  ./bin/mahout  arff.vector --input ../X4.classifier.arff --output
> raghavan_test_output/ --dictOut label_bindings
>
> I found the label_bindings file to be empty. I could nt understand the
> reason for it and also wondering what the content of the label_bindings
> files should be..
>
> I had a .mvc file generated in the directory i mentioned.
>
> Can I use this file to train a classifier using the command shown below ?
> ./bin/mahout trainclassifier --input
> raghavan_test_output/X4.classifier.arff.mvc --output X4model -type cbayes
>
> I am pasting the logs below which i got while executing the command above
>
> INFO: Bayes Feature Mapper: Document Label:
> ????????5S???s??8K????[???M?q?^J??Z??<???G??Ij??
>
> k?j?O??
>
> u_????(q?6j??h?O?#O???k^?????^?5???F}A>8???Y??9???c???????i???q?,??*?G?E???W?-?9??
> May 10, 2011 6:43:15 AM org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
> INFO:  map 100% reduce 0%
> May 10, 2011 6:43:17 AM org.apache.hadoop.mapred.LocalJobRunner$Job
> statusUpdate
> INFO: Bayes Feature Mapper: Document Label:
> ????????5S???s??8K????[???M?q?^J??Z??<???G??Ij??
>
> k?j?O??
>
> u_????(q?6j??h?O?#O???k^?????^?5???F}A>8???Y??9???c???????i???q?,??*?G?E???W?-?9??
> May 10, 2011 6:43:20 AM org.apache.hadoop.mapred.LocalJobRunner$Job
> statusUpdate
> INFO: Bayes Feature Mapper: Document Label:
> ????????5S???s??8K????[???M?q?^J??Z??<???G??Ij??
>
> k?j?O??
>
> u_????(q?6j??h?O?#O???k^?????^?5???F}A>8???Y??9???c???????i???q?,??*?G?E???W?-?9??
>
> thanks,
> Raghavan
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-mvc-file-to-train-a-classifier-tp2922588p2922588.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>