You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Oleg Kalnichevski (JIRA)" <ji...@apache.org> on 2010/12/11 21:27:01 UTC

[jira] Created: (MAHOUT-562) Results produced by Complementary Bayes Classifier seem odd

Results produced by Complementary Bayes Classifier seem odd
-----------------------------------------------------------

                 Key: MAHOUT-562
                 URL: https://issues.apache.org/jira/browse/MAHOUT-562
             Project: Mahout
          Issue Type: Bug
          Components: Classification
    Affects Versions: 0.4
            Reporter: Oleg Kalnichevski


The 20newsgroups example produces expected results (95% correctness rate) when using the Naive Bayes algorithm. When switching the algorithm to the Complementary Bayes while all other parameters remain the same the rate of correctly classified documents drops to 5%. This seems odd to me. 

I admit I know next to nothing about the Bayes theorem and possibly my expectations are totally off. 

---
Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier classifySequential
INFO: Loading model from: {basePath=/home/oleg/data/mahout/20news-bayes-model, classifierType=cbayes, alpha_i=1, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, testDirPath=/home/oleg/data/mahout/20news-bayes-train-input}
Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier classifySequential
INFO: Testing Complementary Bayes Classifier
...
INFO: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :        578	    5.1087%
Incorrectly Classified Instances        :      10736	   94.8913%
Total Classified Instances              :      11314

=======================================================
Confusion Matrix
-------------------------------------------------------
a    	b    	c    	d    	e    	f    	g    	h    	i    	j    	k    	l    	m    	n    	o    	p    	q    	r    	s    	t    	<--Classified as
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	597  	0    	0    	0    	0    	0    	0    	 |  597   	a     = rec.sport.baseball
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	595  	0    	0    	0    	0    	0    	0    	 |  595   	b     = sci.crypt
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	600  	0    	0    	0    	0    	0    	0    	 |  600   	c     = rec.sport.hockey
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	546  	0    	0    	0    	0    	0    	0    	 |  546   	d     = talk.politics.guns
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	599  	0    	0    	0    	0    	0    	0    	 |  599   	e     = soc.religion.christian
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	591  	0    	0    	0    	0    	0    	0    	 |  591   	f     = sci.electronics
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	591  	0    	0    	0    	0    	0    	0    	 |  591   	g     = comp.os.ms-windows.misc
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	585  	0    	0    	0    	0    	0    	0    	 |  585   	h     = misc.forsale
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	377  	0    	0    	0    	0    	0    	0    	 |  377   	i     = talk.religion.misc
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	480  	0    	0    	0    	0    	0    	0    	 |  480   	j     = alt.atheism
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	593  	0    	0    	0    	0    	0    	0    	 |  593   	k     = comp.windows.x
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	564  	0    	0    	0    	0    	0    	0    	 |  564   	l     = talk.politics.mideast
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	590  	0    	0    	0    	0    	0    	0    	 |  590   	m     = comp.sys.ibm.pc.hardware
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	578  	0    	0    	0    	0    	0    	0    	 |  578   	n     = comp.sys.mac.hardware
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	593  	0    	0    	0    	0    	0    	0    	 |  593   	o     = sci.space
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	598  	0    	0    	0    	0    	0    	0    	 |  598   	p     = rec.motorcycles
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	594  	0    	0    	0    	0    	0    	0    	 |  594   	q     = rec.autos
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	584  	0    	0    	0    	0    	0    	0    	 |  584   	r     = comp.graphics
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	465  	0    	0    	0    	0    	0    	0    	 |  465   	s     = talk.politics.misc
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	594  	0    	0    	0    	0    	0    	0    	 |  594   	t     = sci.med
Default Category: unknown: 20

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-562) Results produced by Complementary Bayes Classifier seem odd

Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oleg Kalnichevski resolved MAHOUT-562.
--------------------------------------

    Resolution: Invalid

Apparently I used the wrong module produced with the 'bayes' algorithm type. My bad.  Apologies for the noise.

Oleg

> Results produced by Complementary Bayes Classifier seem odd
> -----------------------------------------------------------
>
>                 Key: MAHOUT-562
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-562
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.4
>            Reporter: Oleg Kalnichevski
>
> The 20newsgroups example produces expected results (95% correctness rate) when using the Naive Bayes algorithm. When switching the algorithm to the Complementary Bayes while all other parameters remain the same the rate of correctly classified documents drops to 5%. This seems odd to me. 
> I admit I know next to nothing about the Bayes theorem and possibly my expectations are totally off. 
> ---
> Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier classifySequential
> INFO: Loading model from: {basePath=/home/oleg/data/mahout/20news-bayes-model, classifierType=cbayes, alpha_i=1, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, testDirPath=/home/oleg/data/mahout/20news-bayes-train-input}
> Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier classifySequential
> INFO: Testing Complementary Bayes Classifier
> ...
> INFO: =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :        578	    5.1087%
> Incorrectly Classified Instances        :      10736	   94.8913%
> Total Classified Instances              :      11314
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a    	b    	c    	d    	e    	f    	g    	h    	i    	j    	k    	l    	m    	n    	o    	p    	q    	r    	s    	t    	<--Classified as
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	597  	0    	0    	0    	0    	0    	0    	 |  597   	a     = rec.sport.baseball
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	595  	0    	0    	0    	0    	0    	0    	 |  595   	b     = sci.crypt
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	600  	0    	0    	0    	0    	0    	0    	 |  600   	c     = rec.sport.hockey
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	546  	0    	0    	0    	0    	0    	0    	 |  546   	d     = talk.politics.guns
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	599  	0    	0    	0    	0    	0    	0    	 |  599   	e     = soc.religion.christian
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	591  	0    	0    	0    	0    	0    	0    	 |  591   	f     = sci.electronics
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	591  	0    	0    	0    	0    	0    	0    	 |  591   	g     = comp.os.ms-windows.misc
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	585  	0    	0    	0    	0    	0    	0    	 |  585   	h     = misc.forsale
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	377  	0    	0    	0    	0    	0    	0    	 |  377   	i     = talk.religion.misc
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	480  	0    	0    	0    	0    	0    	0    	 |  480   	j     = alt.atheism
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	593  	0    	0    	0    	0    	0    	0    	 |  593   	k     = comp.windows.x
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	564  	0    	0    	0    	0    	0    	0    	 |  564   	l     = talk.politics.mideast
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	590  	0    	0    	0    	0    	0    	0    	 |  590   	m     = comp.sys.ibm.pc.hardware
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	578  	0    	0    	0    	0    	0    	0    	 |  578   	n     = comp.sys.mac.hardware
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	593  	0    	0    	0    	0    	0    	0    	 |  593   	o     = sci.space
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	598  	0    	0    	0    	0    	0    	0    	 |  598   	p     = rec.motorcycles
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	594  	0    	0    	0    	0    	0    	0    	 |  594   	q     = rec.autos
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	584  	0    	0    	0    	0    	0    	0    	 |  584   	r     = comp.graphics
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	465  	0    	0    	0    	0    	0    	0    	 |  465   	s     = talk.politics.misc
> 0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	594  	0    	0    	0    	0    	0    	0    	 |  594   	t     = sci.med
> Default Category: unknown: 20

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.