You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Oleg Kalnichevski (JIRA)" <ji...@apache.org> on 2010/12/11 21:27:01 UTC
[jira] Created: (MAHOUT-562) Results produced by Complementary
Bayes Classifier seem odd
Results produced by Complementary Bayes Classifier seem odd
-----------------------------------------------------------
Key: MAHOUT-562
URL: https://issues.apache.org/jira/browse/MAHOUT-562
Project: Mahout
Issue Type: Bug
Components: Classification
Affects Versions: 0.4
Reporter: Oleg Kalnichevski
The 20newsgroups example produces expected results (95% correctness rate) when using the Naive Bayes algorithm. When switching the algorithm to the Complementary Bayes while all other parameters remain the same the rate of correctly classified documents drops to 5%. This seems odd to me.
I admit I know next to nothing about the Bayes theorem and possibly my expectations are totally off.
---
Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier classifySequential
INFO: Loading model from: {basePath=/home/oleg/data/mahout/20news-bayes-model, classifierType=cbayes, alpha_i=1, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, testDirPath=/home/oleg/data/mahout/20news-bayes-train-input}
Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier classifySequential
INFO: Testing Complementary Bayes Classifier
...
INFO: =======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 578 5.1087%
Incorrectly Classified Instances : 10736 94.8913%
Total Classified Instances : 11314
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classified as
0 0 0 0 0 0 0 0 0 0 0 0 0 597 0 0 0 0 0 0 | 597 a = rec.sport.baseball
0 0 0 0 0 0 0 0 0 0 0 0 0 595 0 0 0 0 0 0 | 595 b = sci.crypt
0 0 0 0 0 0 0 0 0 0 0 0 0 600 0 0 0 0 0 0 | 600 c = rec.sport.hockey
0 0 0 0 0 0 0 0 0 0 0 0 0 546 0 0 0 0 0 0 | 546 d = talk.politics.guns
0 0 0 0 0 0 0 0 0 0 0 0 0 599 0 0 0 0 0 0 | 599 e = soc.religion.christian
0 0 0 0 0 0 0 0 0 0 0 0 0 591 0 0 0 0 0 0 | 591 f = sci.electronics
0 0 0 0 0 0 0 0 0 0 0 0 0 591 0 0 0 0 0 0 | 591 g = comp.os.ms-windows.misc
0 0 0 0 0 0 0 0 0 0 0 0 0 585 0 0 0 0 0 0 | 585 h = misc.forsale
0 0 0 0 0 0 0 0 0 0 0 0 0 377 0 0 0 0 0 0 | 377 i = talk.religion.misc
0 0 0 0 0 0 0 0 0 0 0 0 0 480 0 0 0 0 0 0 | 480 j = alt.atheism
0 0 0 0 0 0 0 0 0 0 0 0 0 593 0 0 0 0 0 0 | 593 k = comp.windows.x
0 0 0 0 0 0 0 0 0 0 0 0 0 564 0 0 0 0 0 0 | 564 l = talk.politics.mideast
0 0 0 0 0 0 0 0 0 0 0 0 0 590 0 0 0 0 0 0 | 590 m = comp.sys.ibm.pc.hardware
0 0 0 0 0 0 0 0 0 0 0 0 0 578 0 0 0 0 0 0 | 578 n = comp.sys.mac.hardware
0 0 0 0 0 0 0 0 0 0 0 0 0 593 0 0 0 0 0 0 | 593 o = sci.space
0 0 0 0 0 0 0 0 0 0 0 0 0 598 0 0 0 0 0 0 | 598 p = rec.motorcycles
0 0 0 0 0 0 0 0 0 0 0 0 0 594 0 0 0 0 0 0 | 594 q = rec.autos
0 0 0 0 0 0 0 0 0 0 0 0 0 584 0 0 0 0 0 0 | 584 r = comp.graphics
0 0 0 0 0 0 0 0 0 0 0 0 0 465 0 0 0 0 0 0 | 465 s = talk.politics.misc
0 0 0 0 0 0 0 0 0 0 0 0 0 594 0 0 0 0 0 0 | 594 t = sci.med
Default Category: unknown: 20
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAHOUT-562) Results produced by Complementary
Bayes Classifier seem odd
Posted by "Oleg Kalnichevski (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Oleg Kalnichevski resolved MAHOUT-562.
--------------------------------------
Resolution: Invalid
Apparently I used the wrong module produced with the 'bayes' algorithm type. My bad. Apologies for the noise.
Oleg
> Results produced by Complementary Bayes Classifier seem odd
> -----------------------------------------------------------
>
> Key: MAHOUT-562
> URL: https://issues.apache.org/jira/browse/MAHOUT-562
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.4
> Reporter: Oleg Kalnichevski
>
> The 20newsgroups example produces expected results (95% correctness rate) when using the Naive Bayes algorithm. When switching the algorithm to the Complementary Bayes while all other parameters remain the same the rate of correctly classified documents drops to 5%. This seems odd to me.
> I admit I know next to nothing about the Bayes theorem and possibly my expectations are totally off.
> ---
> Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier classifySequential
> INFO: Loading model from: {basePath=/home/oleg/data/mahout/20news-bayes-model, classifierType=cbayes, alpha_i=1, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8, defaultCat=unknown, testDirPath=/home/oleg/data/mahout/20news-bayes-train-input}
> Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier classifySequential
> INFO: Testing Complementary Bayes Classifier
> ...
> INFO: =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 578 5.1087%
> Incorrectly Classified Instances : 10736 94.8913%
> Total Classified Instances : 11314
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j k l m n o p q r s t <--Classified as
> 0 0 0 0 0 0 0 0 0 0 0 0 0 597 0 0 0 0 0 0 | 597 a = rec.sport.baseball
> 0 0 0 0 0 0 0 0 0 0 0 0 0 595 0 0 0 0 0 0 | 595 b = sci.crypt
> 0 0 0 0 0 0 0 0 0 0 0 0 0 600 0 0 0 0 0 0 | 600 c = rec.sport.hockey
> 0 0 0 0 0 0 0 0 0 0 0 0 0 546 0 0 0 0 0 0 | 546 d = talk.politics.guns
> 0 0 0 0 0 0 0 0 0 0 0 0 0 599 0 0 0 0 0 0 | 599 e = soc.religion.christian
> 0 0 0 0 0 0 0 0 0 0 0 0 0 591 0 0 0 0 0 0 | 591 f = sci.electronics
> 0 0 0 0 0 0 0 0 0 0 0 0 0 591 0 0 0 0 0 0 | 591 g = comp.os.ms-windows.misc
> 0 0 0 0 0 0 0 0 0 0 0 0 0 585 0 0 0 0 0 0 | 585 h = misc.forsale
> 0 0 0 0 0 0 0 0 0 0 0 0 0 377 0 0 0 0 0 0 | 377 i = talk.religion.misc
> 0 0 0 0 0 0 0 0 0 0 0 0 0 480 0 0 0 0 0 0 | 480 j = alt.atheism
> 0 0 0 0 0 0 0 0 0 0 0 0 0 593 0 0 0 0 0 0 | 593 k = comp.windows.x
> 0 0 0 0 0 0 0 0 0 0 0 0 0 564 0 0 0 0 0 0 | 564 l = talk.politics.mideast
> 0 0 0 0 0 0 0 0 0 0 0 0 0 590 0 0 0 0 0 0 | 590 m = comp.sys.ibm.pc.hardware
> 0 0 0 0 0 0 0 0 0 0 0 0 0 578 0 0 0 0 0 0 | 578 n = comp.sys.mac.hardware
> 0 0 0 0 0 0 0 0 0 0 0 0 0 593 0 0 0 0 0 0 | 593 o = sci.space
> 0 0 0 0 0 0 0 0 0 0 0 0 0 598 0 0 0 0 0 0 | 598 p = rec.motorcycles
> 0 0 0 0 0 0 0 0 0 0 0 0 0 594 0 0 0 0 0 0 | 594 q = rec.autos
> 0 0 0 0 0 0 0 0 0 0 0 0 0 584 0 0 0 0 0 0 | 584 r = comp.graphics
> 0 0 0 0 0 0 0 0 0 0 0 0 0 465 0 0 0 0 0 0 | 465 s = talk.politics.misc
> 0 0 0 0 0 0 0 0 0 0 0 0 0 594 0 0 0 0 0 0 | 594 t = sci.med
> Default Category: unknown: 20
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.