You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Andrew Palumbo (JIRA)" <ji...@apache.org> on 2014/04/02 03:54:16 UTC
[jira] [Comment Edited] (MAHOUT-1369) Why is theta normalization
for naive bayes classification commented out?
[ https://issues.apache.org/jira/browse/MAHOUT-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957242#comment-13957242 ]
Andrew Palumbo edited comment on MAHOUT-1369 at 4/2/14 1:52 AM:
----------------------------------------------------------------
Going back and looking at the mahout .5 and .6 releases, it looks like there were some major changes to the Naive Bayes implementation between .6/.7. NB seems to have been completely refactored/rewritten. In the pre .7 versions TF-IDF transformations are done internally to NB. After .7 the algorithm is relaxed and the transformations are done externally (eg. via seq2sparse) . It looks like the weight (Theta) normalization was never properly implemented after that move. It should be a relatively easy fix and will allow for all 4 flavors of the NB algorithm from the Reinne paper.
If there are no objections, as this is a question JIRA, I'll create 3 new JIRAs related to NB:
1. Update website to current NB specs (Current is for pre .7)
2. Fix Theta-Normalization problem
3. Address the error reported on the dev list last week by Chandler Burgess re: testnb failing in sequential mode
was (Author: andrew_palumbo):
Going back and looking at the mahout .5 and .6 releases, it looks like there were some major changes to the Naive Bayes implementation between .6/.7. NB seems to have been completely refactored/rewritten. In the pre .7 versions TF-IDF transformations are done internally to NB. After .7 the algorithm is relaxed and the transformations are done externally (eg. via seq2sparse) . It looks like the weight (Theta) normalization was never properly implemented after that move. It should be a relatively easy fix and will allow for all 4 flavors of the NB algorithm from the Reinne paper.
If there are no objections, as this is a question JIRA, I'll create 3 new JIRAs related to NB:
1. Update website to current NB specs (Current is for pre .7)
2. Fix Theta-Normalization problem
3. Address the error reported on the dev list last week by Chandler Burgess re: testnb failing in sequential mode
> Why is theta normalization for naive bayes classification commented out?
> ------------------------------------------------------------------------
>
> Key: MAHOUT-1369
> URL: https://issues.apache.org/jira/browse/MAHOUT-1369
> Project: Mahout
> Issue Type: Question
> Components: Classification
> Affects Versions: 0.7, 0.8, 0.9
> Environment: mahout 0.8
> Reporter: utku yaman
> Priority: Minor
> Labels: features
> Fix For: 1.0
>
>
> TrainNaiveBayesJob line 155:158
> and
> BayesUtils line 86:93
> are commented out and these lines are for theta normalization for bayes.
> what is the problem with the code and is there a plan for correcting these methods.
--
This message was sent by Atlassian JIRA
(v6.2#6252)