You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Andrea Leistra <al...@gmail.com> on 2012/10/18 23:04:00 UTC

Problems with testing Naive Bayes for small number of test cases in one category

I'm working on a naive Bayes classifier in a case where a few
categories are much less common than the rest.  In the latest run of
the process it happened that no instances of one of these ended up in
the test set.   As a result testnb failed with the following error
(actual name of the label elided):

Exception in thread "main" java.lang.IllegalArgumentException: Label
not found: LabelXYZ
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
	at org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:102)
	at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:122)
	at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:126)
	at org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:94)
	at org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:71)
	at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.analyzeResults(TestNaiveBayesDriver.java:158)
	at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBayesDriver.java:124)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.main(TestNaiveBayesDriver.java:65)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

I see why this is happening, but I'm not sure it makes sense for the
test to fail entirely rather than just fill that column in the
confusion matrix with zeroes.  Before I dive into the ConfusionMatrix
code to deal with this, is there a reason I'm missing for this
behavior?

-- 
Andrea Leistra
aleistra@gmail.com