You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2011/02/01 09:25:30 UTC

Problem with Wikipedia demo

Yup, the Wikipedia Bayes classifier demo doesn't work. This is with a
recent checkout, minus the new changes to Vector.normalize(). Would
that matter?

I tested with chunk-0001.xml as training data and chunk-0002.xml as
test data. Countries.txt as the whatever it's called.

When I test against the same data I trained on (chunk-0001.xml), I get this:
 =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :          5       83.3333%
Incorrectly Classified Instances        :          1       16.6667%
Total Classified Instances              :          6

=======================================================
Confusion Matrix
-------------------------------------------------------
a      b      c         <--Classified as
0      1      0          |  1           a     = algeria
0      5      0          |  5           b     = united_states
0      0      0          |  0           c     = unknown
Default Category: unknown: 2


When I test with this on a different input set (chunk-0002.xml), I get this:

Running on hadoop, using HADOOP_HOME=/lucid/lance/open/hadoop-0.20.2
No HADOOP_CONF_DIR set, using /lucid/lance/open/hadoop-0.20.2/conf
11/01/31 23:49:16 INFO bayes.TestClassifier: Loading model from:
{basePath=../datasets/wikipedia_train1/, classifierType=bayes,
alpha_i=1.0, dataSource=hdfs, gramSize=1, verbose=false,
encoding=UTF-8, defaultCat=unknown,
testDirPath=../datasets/wikipedia_input2}
11/01/31 23:49:16 INFO bayes.TestClassifier: Testing Bayes Classifier
11/01/31 23:49:16 INFO io.SequenceFileModelReader:
file:/lucid/lance/open/datasets/wikipedia_train1/trainer-weights/Sigma_j/part-00000
11/01/31 23:49:16 INFO io.SequenceFileModelReader:
file:/lucid/lance/open/datasets/wikipedia_train1/trainer-weights/Sigma_k/part-00000
11/01/31 23:49:16 INFO io.SequenceFileModelReader:
file:/lucid/lance/open/datasets/wikipedia_train1/trainer-weights/Sigma_kSigma_j/part-00000
11/01/31 23:49:16 INFO io.SequenceFileModelReader: 24.58041375116976
11/01/31 23:49:16 INFO io.SequenceFileModelReader:
file:/lucid/lance/open/datasets/wikipedia_train1/trainer-thetaNormalizer/part-00000
11/01/31 23:49:16 INFO io.SequenceFileModelReader:
file:/lucid/lance/open/datasets/wikipedia_train1/trainer-tfIdf/trainer-tfIdf/part-00000
11/01/31 23:49:17 INFO datastore.InMemoryBayesDatastore: algeria
-30159.939567094563 78146.01904096449 -0.38594339081156
11/01/31 23:49:17 INFO datastore.InMemoryBayesDatastore: united_states
-78146.01904096449 78146.01904096449 -1.0
Exception in thread "main" java.lang.NullPointerException
  at org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:99)
  at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:115)
  at org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:119)
  at org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:87)
  at org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:69)
  at org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:266)
  at org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:186)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

-- 
Lance Norskog
goksron@gmail.com