You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Sam Cunningham <sa...@yahoo.com> on 2011/10/29 04:58:36 UTC

Mahout - 20news example

I have a text classification project. So, I am going through the examples
provided in Mahout in Action book. 20news example works fine for me.
However, I don't understand something: Why do we include the target
variables in the test data files? (target variable - tab - text content). I
understand that in order for us to train the program we need to provide
target variables but I don't understand why we include target variables in
the test files? Isn't Mahout supposed to determine them by using the model
created from training? Just to test that, I renamed the folder names under
20news-bydate-test to 1, 2, 3, ...20. Then I ran prepare20newsgroups to
generate the files required for naive bayes classifier. The new files
included renamed folder names as target variables such that 1, 2, 3, ... 20.
When I ran the testclassifier after training the classifier, I received the
the following error. Why? Please help me understand. Also, is there Java
source code for 20newsgroup bayes classification (instead of command line)?

Exception in thread "main" java.lang.IllegalArgumentException: Label not
found: 20
	at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
	at
org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
	at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
	at
org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
	at
org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
	at
org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
	at
org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:252)
	at
org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:185)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


--
View this message in context: http://lucene.472066.n3.nabble.com/Mahout-20news-example-tp3462754p3462754.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Re: Mahout - 20news example

Posted by Ted Dunning <te...@gmail.com>.
The reason for including the target variable in the test file is so that the
classifier can be run and the output compared to the correct answer.
 Otherwise, all that would be possible is to get the output of the
classifier and you would have to run an entire other program to find out
which answers were correct and which not.  Having the classification and
verification happen together is just easier.


On Fri, Oct 28, 2011 at 7:58 PM, Sam Cunningham <sa...@yahoo.com>wrote:

> I have a text classification project. So, I am going through the examples
> provided in Mahout in Action book. 20news example works fine for me.
> However, I don't understand something: Why do we include the target
> variables in the test data files? (target variable - tab - text content). I
> understand that in order for us to train the program we need to provide
> target variables but I don't understand why we include target variables in
> the test files? Isn't Mahout supposed to determine them by using the model
> created from training? Just to test that, I renamed the folder names under
> 20news-bydate-test to 1, 2, 3, ...20. Then I ran prepare20newsgroups to
> generate the files required for naive bayes classifier. The new files
> included renamed folder names as target variables such that 1, 2, 3, ...
> 20.
> When I ran the testclassifier after training the classifier, I received the
> the following error. Why? Please help me understand. Also, is there Java
> source code for 20newsgroup bayes classification (instead of command line)?
>
> Exception in thread "main" java.lang.IllegalArgumentException: Label not
> found: 20
>        at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>        at
>
> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:93)
>        at
>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:113)
>        at
>
> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>        at
>
> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:85)
>        at
>
> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:67)
>        at
>
> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:252)
>        at
>
> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:185)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Mahout-20news-example-tp3462754p3462754.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>