You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Andrew Palumbo (JIRA)" <ji...@apache.org> on 2014/04/02 03:54:16 UTC

[jira] [Comment Edited] (MAHOUT-1369) Why is theta normalization for naive bayes classification commented out?

    [ https://issues.apache.org/jira/browse/MAHOUT-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957242#comment-13957242 ] 

Andrew Palumbo edited comment on MAHOUT-1369 at 4/2/14 1:52 AM:
----------------------------------------------------------------

Going back and looking at the mahout .5 and .6 releases, it looks like there were some major changes to the Naive Bayes implementation between .6/.7.  NB seems to have been completely refactored/rewritten.   In the pre .7 versions TF-IDF transformations are done internally to NB.  After .7 the algorithm is relaxed and the transformations are done externally (eg. via seq2sparse) . It looks like the weight (Theta) normalization was never properly implemented after that move.  It should be a relatively easy fix and will allow for all 4 flavors of the NB algorithm from the Reinne paper.

If there are no objections, as this is a question JIRA, I'll create 3 new JIRAs related to NB:

          1.  Update website to current NB specs (Current is for pre .7) 
          2.  Fix Theta-Normalization problem
          3.  Address the error reported on the dev list last week by Chandler Burgess re: testnb failing in sequential mode




was (Author: andrew_palumbo):
Going back and looking at the mahout .5 and .6 releases, it looks like there were some major changes to the Naive Bayes implementation between .6/.7.  NB seems to have been completely refactored/rewritten.   In the pre .7 versions TF-IDF transformations are done internally to NB.  After .7 the algorithm is relaxed and the transformations are done externally (eg. via seq2sparse) . It looks like the weight (Theta) normalization was never properly implemented after that move.  It should be a relatively easy fix and will allow for all 4 flavors of the NB algorithm from the Reinne paper.

If there are no objections, as this is a question JIRA, I'll create 3 new JIRAs related to NB:

    1.  Update website to current NB specs (Current is for pre .7) 
    2.  Fix Theta-Normalization problem
    3.  Address the error reported on the dev list last week by Chandler Burgess re: testnb failing in sequential mode



> Why is theta normalization for naive bayes classification commented out?
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-1369
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1369
>             Project: Mahout
>          Issue Type: Question
>          Components: Classification
>    Affects Versions: 0.7, 0.8, 0.9
>         Environment: mahout 0.8
>            Reporter: utku yaman
>            Priority: Minor
>              Labels: features
>             Fix For: 1.0
>
>
> TrainNaiveBayesJob line 155:158
> and
> BayesUtils line 86:93
> are commented out and these lines are for theta normalization for bayes.
> what is the problem with the code and is there a plan for correcting these methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)