You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Andrew Palumbo (JIRA)" <ji...@apache.org> on 2014/03/31 00:08:15 UTC

[jira] [Commented] (MAHOUT-1369) Why is theta normalization for naive bayes classification commented out?

    [ https://issues.apache.org/jira/browse/MAHOUT-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954850#comment-13954850 ] 

Andrew Palumbo commented on MAHOUT-1369:
----------------------------------------

>From what can see, looking into this a bit more today, the original paper (Rennie et al.) Focuses on 4 Naive Bayes Models.  Multinomial Naive Bayes (MNB), Complement Naive Bayes (CNB), Weight Normalized Complment Naive Bayes (WCNB) and Transformed Weight Normalized Complement Naive Bayes (TWNCB).  The current mahout NB implementation so far seems to be only for: 

MNB (trainnb/testnb ...)
CNB (trainnb/testnb -c ...)

Theta normalization is only called for WCNB and TWNCB. So it being commented out doesn't effect MNB or CNB.  

It seems that the call to the thetaSummer job is commented out because the weight normalization/transformation implementation is incomplete.  As far as I can tell MNB and CNB classifiers seem to be calculating weights correctly.

If the goal is to stick to the Rennie implementations, I think that once the thetaSummer job (or whatever turns out to be the problem with weight normalization/transformation) is corrected/completed, it should only be called when a separate option is supplied- something like: 

trainnb/testnb -wcnb (WCNB) 
trainnb/testnb -twcnb (TWCNB)

I also just noticed that the mahout website says that the TWCNB implementation is what's being called in mahout's complementary naive bayes:

     https://mahout.apache.org/users/classification/bayesian.html

however i believe that the CNB implementation is what's really being called here.

I think that there is more going on here as well- the weight summer may need to be called in a different order. I will continue to look into this over this week.

    

> Why is theta normalization for naive bayes classification commented out?
> ------------------------------------------------------------------------
>
>                 Key: MAHOUT-1369
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1369
>             Project: Mahout
>          Issue Type: Question
>          Components: Classification
>    Affects Versions: 0.7, 0.8, 0.9
>         Environment: mahout 0.8
>            Reporter: utku yaman
>            Priority: Minor
>              Labels: features
>             Fix For: 1.0
>
>
> TrainNaiveBayesJob line 155:158
> and
> BayesUtils line 86:93
> are commented out and these lines are for theta normalization for bayes.
> what is the problem with the code and is there a plan for correcting these methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)