You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Sam Cunningham <sa...@yahoo.com> on 2011/10/29 05:07:01 UTC

20news example - Why target variables in test files?

I have a text classification project. So, I am going through the examples
provided in Mahout in Action book. 20news example works fine for me.
However, I don't understand something: Why do we include the target
variables in the test data files? (target variable - tab - text content). I
understand that in order for us to train the program we need to provide
target variables in the training files but I don't understand why we include
target variables in the test files? Isn't Mahout supposed to determine them
by using the model created from training? Just to test that, I renamed the
folder names under 20news-bydate-test to 1, 2, 3, ...20. Then I ran
prepare20newsgroups to generate the files required for naive bayes
classifier. The new files included renamed folder names as target variables
such that 1, 2, 3, ... 20. When I ran the testclassifier after training the
classifier, I received the the following error. Why? Please help me
understand. Also, is there Java source code for 20newsgroup bayes
classification (instead of command line)?

Exception in thread "main" java.lang.IllegalArgumentException: Label not
found: 20
        at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
        at
org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
        at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
        at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
        at
org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
        at
org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
        at
org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:252)
        at
org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:185)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 

--
View this message in context: http://lucene.472066.n3.nabble.com/20news-example-Why-target-variables-in-test-files-tp3462773p3462773.html
Sent from the Mahout User List mailing list archive at Nabble.com.