You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by qiaoresearcher <qi...@gmail.com> on 2014/02/21 05:02:03 UTC

complementary naive bayes classifier

Does mahout have complementary naive bayes implementation available?
I checked the mahout source code, it seems the author did not finish it
yet? as shown in the following, the thetaSummer job is not submitted.

public final class TrainNaiveBayesJob extends AbstractJob {

....

thetaSummer.getConfiguration().setBoolean(ThetaMapper.TRAIN_COMPLEMENTARY,
trainComplementary);
/* TODO(robinanil): Enable this when thetanormalization works.
    succeeded = thetaSummer.waitForCompletion(true);
    if (!succeeded) {
      return -1;
    }*/

.....

}

Any comments will be appreciated.

Re: complementary naive bayes classifier

Posted by Suneel Marthi <su...@yahoo.com>.
Answering myself here.

Looking at the code and reading the relevant sections of the paper (see Section 3.1 in the Rennie paper) , seems to me that the implementation is in place for theta normalization.  Now its just a matter of having to test and validate the output.








On Friday, February 21, 2014 12:10 AM, Suneel Marthi <su...@yahoo.com> wrote:
 
Complimentary Naive Bayes does exist in Mahout (invoked with -c option when running BayesDriver). 

The code for ThetaSummer job does exist and the code being still commented out (been that way since Mahout 0.7) could be either due to oversight or due to not having tested Theta Normalization thoroughly.

There's a jira already open for this, see MAHOUT-1369.  Robin Anil, could u explain if this code can be uncommented or if its still not functional?

For whomever that would like to
 work on this, it would be great to add code comments (presently missing from this code) and also refer the original paper (see below).  

For reference, Mahout Naive Bayes (and complementary Naive Bayes) classifiers impl is based on the Rennie paper on this subject - http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf










On Thursday, February 20, 2014 11:40 PM, Andrew Musselman <an...@gmail.com> wrote:

It's an
 option when you run the examples as I recall.  Search in examples/bin and you can trace it out.


> On Feb 20, 2014, at 8:02 PM, qiaoresearcher <qi...@gmail.com> wrote:
> 
> Does mahout have complementary naive bayes implementation available?
> I checked the mahout source code, it seems the author did not finish it
> yet? as shown in the following, the thetaSummer job is not submitted.
> 
> public final class TrainNaiveBayesJob extends AbstractJob {
> 
> ....
> 
> thetaSummer.getConfiguration().setBoolean(ThetaMapper.TRAIN_COMPLEMENTARY,
> trainComplementary);
> /* TODO(robinanil): Enable this when thetanormalization works.
>    succeeded = thetaSummer.waitForCompletion(true);
>    if (!succeeded) {
>      return -1;
>    }*/
> 
> .....
> 
> }
> 
> Any comments will be appreciated.

Re: complementary naive bayes classifier

Posted by Suneel Marthi <su...@yahoo.com>.
Complementary Naive Bayes classification is for unbalanced datasets and is available in Mahout, see the relevant section in the Rennie paper on this subject - http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf

The code for Theta Normalization seems complete, so not sure as to why its still commented out (been that way since Mahout 0.7). 

Need to verify if its behavior is correct though.






On Sunday, February 23, 2014 5:46 PM, qiaoresearcher <qi...@gmail.com> wrote:
 
Suneel and Andrew,

Many thanks for the clarification, I do have included the -c option when
train the naive bayes. Will debug the code later on to discover more
details.

A general question, what are the options available in Mahout when we have
very imbalanced data sets?

Regards,




On Fri, Feb 21, 2014 at 12:09 AM, Suneel Marthi <su...@yahoo.com>wrote:

> Complimentary Naive Bayes does exist in Mahout (invoked with -c option
> when running BayesDriver).
>
> The code for ThetaSummer job does exist and the code being still commented
> out (been that way since Mahout 0.7) could be either due to oversight or
> due to not having tested Theta Normalization thoroughly.
>
> There's a jira already open for this, see MAHOUT-1369.  Robin Anil, could
> u explain if this code can be uncommented or if its still not functional?
>
> For whomever that would like to work on this, it would be great to add
> code comments (presently missing from this code) and also refer the
> original paper (see below).
>
> For reference, Mahout Naive Bayes (and complementary Naive Bayes)
> classifiers impl is based on the Rennie paper on this subject -
> http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf
>
>
>
>
>
>
>
>
>
> On Thursday, February 20, 2014 11:40 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
> It's an option when you run the examples as I recall.  Search in
> examples/bin and you can trace it out.
>
>
> > On Feb 20, 2014, at 8:02 PM, qiaoresearcher <qi...@gmail.com>
> wrote:
> >
> > Does mahout have complementary naive bayes implementation available?
> > I checked the mahout source code, it seems the author did not finish it
> > yet? as shown in the following, the thetaSummer job is not submitted.
> >
> > public final class TrainNaiveBayesJob extends AbstractJob {
> >
> > ....
> >
> >
> thetaSummer.getConfiguration().setBoolean(ThetaMapper.TRAIN_COMPLEMENTARY,
> > trainComplementary);
> > /* TODO(robinanil): Enable this when thetanormalization works.
> >    succeeded = thetaSummer.waitForCompletion(true);
> >    if (!succeeded) {
> >      return -1;
> >    }*/
> >
> > .....
> >
> > }
> >
> > Any comments will be appreciated.
>

Re: complementary naive bayes classifier

Posted by qiaoresearcher <qi...@gmail.com>.
Suneel and Andrew,

Many thanks for the clarification, I do have included the -c option when
train the naive bayes. Will debug the code later on to discover more
details.

A general question, what are the options available in Mahout when we have
very imbalanced data sets?

Regards,



On Fri, Feb 21, 2014 at 12:09 AM, Suneel Marthi <su...@yahoo.com>wrote:

> Complimentary Naive Bayes does exist in Mahout (invoked with -c option
> when running BayesDriver).
>
> The code for ThetaSummer job does exist and the code being still commented
> out (been that way since Mahout 0.7) could be either due to oversight or
> due to not having tested Theta Normalization thoroughly.
>
> There's a jira already open for this, see MAHOUT-1369.  Robin Anil, could
> u explain if this code can be uncommented or if its still not functional?
>
> For whomever that would like to work on this, it would be great to add
> code comments (presently missing from this code) and also refer the
> original paper (see below).
>
> For reference, Mahout Naive Bayes (and complementary Naive Bayes)
> classifiers impl is based on the Rennie paper on this subject -
> http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf
>
>
>
>
>
>
>
>
>
> On Thursday, February 20, 2014 11:40 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
> It's an option when you run the examples as I recall.  Search in
> examples/bin and you can trace it out.
>
>
> > On Feb 20, 2014, at 8:02 PM, qiaoresearcher <qi...@gmail.com>
> wrote:
> >
> > Does mahout have complementary naive bayes implementation available?
> > I checked the mahout source code, it seems the author did not finish it
> > yet? as shown in the following, the thetaSummer job is not submitted.
> >
> > public final class TrainNaiveBayesJob extends AbstractJob {
> >
> > ....
> >
> >
> thetaSummer.getConfiguration().setBoolean(ThetaMapper.TRAIN_COMPLEMENTARY,
> > trainComplementary);
> > /* TODO(robinanil): Enable this when thetanormalization works.
> >    succeeded = thetaSummer.waitForCompletion(true);
> >    if (!succeeded) {
> >      return -1;
> >    }*/
> >
> > .....
> >
> > }
> >
> > Any comments will be appreciated.
>

Re: complementary naive bayes classifier

Posted by Suneel Marthi <su...@yahoo.com>.
Complimentary Naive Bayes does exist in Mahout (invoked with -c option when running BayesDriver). 

The code for ThetaSummer job does exist and the code being still commented out (been that way since Mahout 0.7) could be either due to oversight or due to not having tested Theta Normalization thoroughly.

There's a jira already open for this, see MAHOUT-1369.  Robin Anil, could u explain if this code can be uncommented or if its still not functional?

For whomever that would like to work on this, it would be great to add code comments (presently missing from this code) and also refer the original paper (see below).  

For reference, Mahout Naive Bayes (and complementary Naive Bayes) classifiers impl is based on the Rennie paper on this subject - http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf









On Thursday, February 20, 2014 11:40 PM, Andrew Musselman <an...@gmail.com> wrote:
 
It's an option when you run the examples as I recall.  Search in examples/bin and you can trace it out.


> On Feb 20, 2014, at 8:02 PM, qiaoresearcher <qi...@gmail.com> wrote:
> 
> Does mahout have complementary naive bayes implementation available?
> I checked the mahout source code, it seems the author did not finish it
> yet? as shown in the following, the thetaSummer job is not submitted.
> 
> public final class TrainNaiveBayesJob extends AbstractJob {
> 
> ....
> 
> thetaSummer.getConfiguration().setBoolean(ThetaMapper.TRAIN_COMPLEMENTARY,
> trainComplementary);
> /* TODO(robinanil): Enable this when thetanormalization works.
>    succeeded = thetaSummer.waitForCompletion(true);
>    if (!succeeded) {
>      return -1;
>    }*/
> 
> .....
> 
> }
> 
> Any comments will be appreciated.

Re: complementary naive bayes classifier

Posted by Andrew Musselman <an...@gmail.com>.
It's an option when you run the examples as I recall.  Search in examples/bin and you can trace it out.

> On Feb 20, 2014, at 8:02 PM, qiaoresearcher <qi...@gmail.com> wrote:
> 
> Does mahout have complementary naive bayes implementation available?
> I checked the mahout source code, it seems the author did not finish it
> yet? as shown in the following, the thetaSummer job is not submitted.
> 
> public final class TrainNaiveBayesJob extends AbstractJob {
> 
> ....
> 
> thetaSummer.getConfiguration().setBoolean(ThetaMapper.TRAIN_COMPLEMENTARY,
> trainComplementary);
> /* TODO(robinanil): Enable this when thetanormalization works.
>    succeeded = thetaSummer.waitForCompletion(true);
>    if (!succeeded) {
>      return -1;
>    }*/
> 
> .....
> 
> }
> 
> Any comments will be appreciated.