You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dipti Mathur <di...@gmail.com> on 2011/05/11 15:42:18 UTC

Used my own data for the 20NewsGroup example. TestClassifier giving incorrect output

Hi All,

I used the 20NewsGroup model to train my data. However, while trying to test
the classifier (test data is same as train data just for simplicity sake
now), I get the following error. Any ideas?

dipti@dipti-laptop:~$ mahout/trunk/bin/mahout testclassifier -m
ruralsearch/bayes-model/ -d ruralsearch/test-input/ -type bayes -ng 1
-source hdfs -method sequential
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20.2/
HADOOP_CONF_DIR=/usr/lib/hadoop-0.20.2/conf
11/05/11 19:02:35 INFO bayes.TestClassifier: Loading model from:
{basePath=ruralsearch/bayes-model/, classifierType=bayes, alpha_i=1.0,
dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
defaultCat=unknown, testDirPath=ruralsearch/test-input/}
11/05/11 19:02:35 INFO bayes.TestClassifier: Testing Bayes Classifier
11/05/11 19:02:36 INFO io.SequenceFileModelReader: 135467.11329474236
11/05/11 19:02:37 INFO datastore.InMemoryBayesDatastore: realestate
-103464.88819958708 168594.15797711344 -0.6136920130627087
11/05/11 19:02:37 INFO datastore.InMemoryBayesDatastore: automobiles
-168594.15797711344 168594.15797711344 -1.0
11/05/11 19:02:37 INFO bayes.TestClassifier:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :          0         �%
Incorrectly Classified Instances        :          0         �%
Total Classified Instances              :          0

=======================================================
Confusion Matrix
-------------------------------------------------------
a     b     c     <--Classified as
0     0     0     |  0     a     = realestate
0     0     0     |  0     b     = automobiles
0     0     0     |  0     c     = unknown
Default Category: unknown: 2


11/05/11 19:02:37 INFO driver.MahoutDriver: Program took 2309 ms

Regards,
Dipti Mathur

Re: Used my own data for the 20NewsGroup example. TestClassifier giving incorrect output

Posted by Grant Ingersoll <gs...@apache.org>.
What steps did you do before this?

(For future reference, this is a good question to ask on user@mahout.apache.org)

On May 11, 2011, at 9:42 AM, Dipti Mathur wrote:

> Hi All,
> 
> I used the 20NewsGroup model to train my data. However, while trying to test
> the classifier (test data is same as train data just for simplicity sake
> now), I get the following error. Any ideas?
> 
> dipti@dipti-laptop:~$ mahout/trunk/bin/mahout testclassifier -m
> ruralsearch/bayes-model/ -d ruralsearch/test-input/ -type bayes -ng 1
> -source hdfs -method sequential
> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20.2/
> HADOOP_CONF_DIR=/usr/lib/hadoop-0.20.2/conf
> 11/05/11 19:02:35 INFO bayes.TestClassifier: Loading model from:
> {basePath=ruralsearch/bayes-model/, classifierType=bayes, alpha_i=1.0,
> dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
> defaultCat=unknown, testDirPath=ruralsearch/test-input/}
> 11/05/11 19:02:35 INFO bayes.TestClassifier: Testing Bayes Classifier
> 11/05/11 19:02:36 INFO io.SequenceFileModelReader: 135467.11329474236
> 11/05/11 19:02:37 INFO datastore.InMemoryBayesDatastore: realestate
> -103464.88819958708 168594.15797711344 -0.6136920130627087
> 11/05/11 19:02:37 INFO datastore.InMemoryBayesDatastore: automobiles
> -168594.15797711344 168594.15797711344 -1.0
> 11/05/11 19:02:37 INFO bayes.TestClassifier:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :          0         �%
> Incorrectly Classified Instances        :          0         �%
> Total Classified Instances              :          0
> 
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a     b     c     <--Classified as
> 0     0     0     |  0     a     = realestate
> 0     0     0     |  0     b     = automobiles
> 0     0     0     |  0     c     = unknown
> Default Category: unknown: 2
> 
> 
> 11/05/11 19:02:37 INFO driver.MahoutDriver: Program took 2309 ms
> 
> Regards,
> Dipti Mathur

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


Re: Used my own data for the 20NewsGroup example. TestClassifier giving incorrect output

Posted by Daniel McEnnis <dm...@gmail.com>.
Dipti,

Double check that your classify data is in category\ttokenized text
format (i.e. the testclassifier data builder rather than the
classifier data builder).

Daniel.

On Wed, May 11, 2011 at 9:42 AM, Dipti Mathur <di...@gmail.com> wrote:
> Hi All,
>
> I used the 20NewsGroup model to train my data. However, while trying to test
> the classifier (test data is same as train data just for simplicity sake
> now), I get the following error. Any ideas?
>
> dipti@dipti-laptop:~$ mahout/trunk/bin/mahout testclassifier -m
> ruralsearch/bayes-model/ -d ruralsearch/test-input/ -type bayes -ng 1
> -source hdfs -method sequential
> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20.2/
> HADOOP_CONF_DIR=/usr/lib/hadoop-0.20.2/conf
> 11/05/11 19:02:35 INFO bayes.TestClassifier: Loading model from:
> {basePath=ruralsearch/bayes-model/, classifierType=bayes, alpha_i=1.0,
> dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8,
> defaultCat=unknown, testDirPath=ruralsearch/test-input/}
> 11/05/11 19:02:35 INFO bayes.TestClassifier: Testing Bayes Classifier
> 11/05/11 19:02:36 INFO io.SequenceFileModelReader: 135467.11329474236
> 11/05/11 19:02:37 INFO datastore.InMemoryBayesDatastore: realestate
> -103464.88819958708 168594.15797711344 -0.6136920130627087
> 11/05/11 19:02:37 INFO datastore.InMemoryBayesDatastore: automobiles
> -168594.15797711344 168594.15797711344 -1.0
> 11/05/11 19:02:37 INFO bayes.TestClassifier:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :          0         �%
> Incorrectly Classified Instances        :          0         �%
> Total Classified Instances              :          0
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a     b     c     <--Classified as
> 0     0     0     |  0     a     = realestate
> 0     0     0     |  0     b     = automobiles
> 0     0     0     |  0     c     = unknown
> Default Category: unknown: 2
>
>
> 11/05/11 19:02:37 INFO driver.MahoutDriver: Program took 2309 ms
>
> Regards,
> Dipti Mathur
>