You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Chandler Burgess <cb...@icontrolesi.com> on 2014/03/27 20:52:36 UTC

MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Hello all,

It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.

I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.

Thanks,
Chandler Burgess

Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Sebastian Schelter <ss...@apache.org>.
Great. The details how to submit a patch are here:

https://mahout.apache.org/developers/how-to-contribute.html

--sebastian

On 03/28/2014 09:29 PM, Chandler Burgess wrote:
> Forgot to include in the last mail. Again, I do have the Rennie paper which I'll dig in to and see if I can fix it sometime in the near future. I'll also look at the problem with -seq flag to testnb.
> All the guidelines for submitting patches are on JIRA or the mahout.apache.org pages, correct?
>
> Chandler
>
> -----Original Message-----
> From: Chandler Burgess [mailto:cburgess@icontrolesi.com]
> Sent: Friday, March 28, 2014 3:16 PM
> To: dev@mahout.apache.org
> Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>
> Well, maybe someone can correct me but this seems disappointing. I uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, added some trace statements in ComplementaryThetaMapper and ComplementaryNaiveBayesClassifier to verify they were being called, and then ran some tests using trainnb/testnb. There was not a single difference in the classifications when train/testcomplementary was specified vs standard naïve bayes.
>
> Also, running testnb with the -seq flag doesn't appear to work.
>
> -----Original Message-----
> From: Chandler Burgess [mailto:cburgess@icontrolesi.com]
> Sent: Thursday, March 27, 2014 5:17 PM
> To: dev@mahout.apache.org
> Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>
> The program I wrote didn't use a model that was trained with Cbayes. After looking at the scorers in SNB and CNB, I figured they would give different results even on a model not trained with CNB. That could very well be ignorance on my part as to the math.
>
> However, I did some command line tests using -c on both training and testing and didn't see any difference in the testnb output.
> ________________________________________
> From: Suneel Marthi <su...@yahoo.com>
> Sent: Thursday, March 27, 2014 5:12 PM
> To: dev@mahout.apache.org
> Cc: ssc@apache.org
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>
> Just checking , u r testing Cbayes on a model that's already been trained using Cbayes correct?
>
> Also the jira I mentioned earlier was fixed for .9, so u should be good. No code changes were done to naive bayes since .9
>
>
> Sent from my iPhone
>
>> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>>
>> Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.
>>
>> Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.
>>
>> Thanks,
>> Chandler
>> ________________________________________
>> From: Sebastian Schelter <ss...@apache.org>
>> Sent: Thursday, March 27, 2014 4:01 PM
>> To: dev@mahout.apache.org
>> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>>
>> Hi Chandler,
>>
>> I think a good way to go would be to reenable theta normalization and
>> run the classification examples that we already have to see how it
>> affects the result (and make sure it improves the result).
>>
>> Would be great to have this fixed. I'm also planning to port NB to our
>> Spark DSL very soon (should be just a few lines of code).
>>
>> --sebastian
>>
>>
>>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
>>> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>>>
>>> Please test with Mahout 0.9 or trunk.
>>>
>>>
>>>
>>>
>>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>>>
>>> Hello all,
>>>
>>> It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>>>
>>> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>>>
>>> Thanks,
>>> Chandler Burgess
>>


RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Chandler Burgess <cb...@icontrolesi.com>.
Forgot to include in the last mail. Again, I do have the Rennie paper which I'll dig in to and see if I can fix it sometime in the near future. I'll also look at the problem with -seq flag to testnb.
All the guidelines for submitting patches are on JIRA or the mahout.apache.org pages, correct?

Chandler

-----Original Message-----
From: Chandler Burgess [mailto:cburgess@icontrolesi.com] 
Sent: Friday, March 28, 2014 3:16 PM
To: dev@mahout.apache.org
Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Well, maybe someone can correct me but this seems disappointing. I uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, added some trace statements in ComplementaryThetaMapper and ComplementaryNaiveBayesClassifier to verify they were being called, and then ran some tests using trainnb/testnb. There was not a single difference in the classifications when train/testcomplementary was specified vs standard naïve bayes.

Also, running testnb with the -seq flag doesn't appear to work.

-----Original Message-----
From: Chandler Burgess [mailto:cburgess@icontrolesi.com]
Sent: Thursday, March 27, 2014 5:17 PM
To: dev@mahout.apache.org
Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

The program I wrote didn't use a model that was trained with Cbayes. After looking at the scorers in SNB and CNB, I figured they would give different results even on a model not trained with CNB. That could very well be ignorance on my part as to the math. 

However, I did some command line tests using -c on both training and testing and didn't see any difference in the testnb output.
________________________________________
From: Suneel Marthi <su...@yahoo.com>
Sent: Thursday, March 27, 2014 5:12 PM
To: dev@mahout.apache.org
Cc: ssc@apache.org
Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Just checking , u r testing Cbayes on a model that's already been trained using Cbayes correct?

Also the jira I mentioned earlier was fixed for .9, so u should be good. No code changes were done to naive bayes since .9


Sent from my iPhone

> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>
> Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.
>
> Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.
>
> Thanks,
> Chandler
> ________________________________________
> From: Sebastian Schelter <ss...@apache.org>
> Sent: Thursday, March 27, 2014 4:01 PM
> To: dev@mahout.apache.org
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>
> Hi Chandler,
>
> I think a good way to go would be to reenable theta normalization and 
> run the classification examples that we already have to see how it 
> affects the result (and make sure it improves the result).
>
> Would be great to have this fixed. I'm also planning to port NB to our 
> Spark DSL very soon (should be just a few lines of code).
>
> --sebastian
>
>
>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
>> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>>
>> Please test with Mahout 0.9 or trunk.
>>
>>
>>
>>
>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>>
>> Hello all,
>>
>> It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>>
>> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>>
>> Thanks,
>> Chandler Burgess
>

RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Andrew Palumbo <ap...@outlook.com>.
Hi Chandler,
I've been looking at the code and the Rennie paper a bit over the past couple of days- i haven't hat too much time with it, but have seen some of the problems.  I may be wrong and please correct me if i am, but I want to say that for the binary classification problem, Multinomial Naive Bayes (MNB) and Complementary Naive Bayes (CNB) should be essentially the same. In the 2 class problem where this implementation of MNB is predicting based on the probability of a document belonging to its class, CNB is predicting based on the probability that it does NOT belong to the ONLY other class. Without the theta-normalization implemented i dont think that CNB and MNB will yield different classifications (for 2 class problems).

I tested this out on 20-Newsgroups and can see that CNB and MNB are giving different classification results for 20 but will give the same result for just 2 of the 20.  

I haven't figured out whats going on with the Theta-normalization yet but it seems to me that it should be implemented as a differnet algorithm (WCMB in the paper) or with an option to enable it within CNB.

Andy  

> Date: Fri, 28 Mar 2014 14:29:54 -0700
> From: suneel_marthi@yahoo.com
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> To: dev@mahout.apache.org
> 
> .. and please create a JIRA for this, it definitely seems like an issue.
> 
> Nevertheless its time to verify and validate this impl given that the original author has not responded.
> 
> 
> 
> On , Suneel Marthi <su...@yahoo.com> wrote:
>  
> I was alluring to TrainNaiveBayesJob which is MR only.  U r right TestNaiveBayesDriver has both MR and sequential. 
> Looking at the code for MR v/s sequential in TestNaiveBayes they both seem to be calling the respective Standard/Complimentary Naive Bayes
>  classifiers.
> 
> I guess we need to look at CNB calculations more closely and see if its doing the right thing.
> 
> On Friday, March 28, 2014 5:09 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>  
> Ok, then I should remove it? There's about 2 dozen lines of code in TestNaiveBayesDriver for running sequentially.
> 
> 
> -----Original Message-----
> From: Suneel Marthi [mailto:suneel_marthi@yahoo.com] 
> Sent: Friday, March 28, 2014 3:51 PM
> To: dev@mahout.apache.org
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> 
> Bayes doesn't have a non-mapreduce impl so -seq flag wouldn't  work. 
> 
> Sent from my
>  iPhone
> 
> > On Mar 28,
>  2014, at 4:16 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
> > 
> > Well, maybe someone can correct me but this seems disappointing. I uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, added some trace statements in ComplementaryThetaMapper and ComplementaryNaiveBayesClassifier to verify they were being called, and then ran some tests using trainnb/testnb. There was not a single difference in the classifications when train/testcomplementary was specified vs standard naïve bayes.
> > 
> > Also, running testnb with the -seq flag doesn't appear to work.
> > 
> > -----Original Message-----
> > From: Chandler Burgess [mailto:cburgess@icontrolesi.com]
> > Sent: Thursday, March 27, 2014 5:17 PM
> > To: dev@mahout.apache.org
> > Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> > 
> > The program I wrote didn't use a model that was trained with Cbayes. After looking at the scorers in SNB and CNB, I figured they would give different results even on a model not trained with CNB. That could very well be ignorance on my part as to the math. 
> > 
> > However, I did some command line tests using -c on both training and testing and didn't see any difference in the testnb output.
> > ________________________________________
> > From: Suneel Marthi <su...@yahoo.com>
> > Sent: Thursday, March 27, 2014 5:12 PM
> > To: dev@mahout.apache.org
> > Cc: ssc@apache.org
> > Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> > 
> > Just checking , u r testing Cbayes on a model that's already been trained using Cbayes correct?
> > 
> > Also the jira I
>  mentioned earlier was fixed for .9, so u should be 
> > good. No code changes were done to naive bayes since .9
> > 
> > 
> > Sent from my iPhone
> > 
> >> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
> >> 
> >> Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.
> >> 
> >> Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.
> >> 
> >> Thanks,
> >> Chandler
> >> ________________________________________
> >> From: Sebastian Schelter <ss...@apache.org>
> >> Sent: Thursday, March 27, 2014 4:01 PM
> >> To: dev@mahout.apache.org
> >> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> >> 
> >> Hi Chandler,
> >> 
> >> I think a good way to go would be to reenable theta normalization and 
> >> run the classification examples that we already have to see how it 
> >> affects the
>  result (and make sure it improves the result).
> >> 
> >> Would be great to have this fixed. I'm also planning to port NB to 
> >> our Spark DSL very soon (should be just a few lines of code).
> >> 
> >>
>  --sebastian
> >> 
> >> 
> >>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
> >>> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
> >>> 
> >>> Please test with Mahout 0.9 or trunk.
> >>> 
> >>> 
> >>> 
> >>> 
> >>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
> >>> 
> >>> Hello all,
> >>> 
> >>> It seems Robin Anil
>  hasn't responded, and no one is
>  sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
> >>> 
> >>> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
> >>> 
> >>> Thanks,
> >>> Chandler Burgess
> >> 
 		 	   		  

Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Suneel Marthi <su...@yahoo.com>.
.. and please create a JIRA for this, it definitely seems like an issue.

Nevertheless its time to verify and validate this impl given that the original author has not responded.



On , Suneel Marthi <su...@yahoo.com> wrote:
 
I was alluring to TrainNaiveBayesJob which is MR only.  U r right TestNaiveBayesDriver has both MR and sequential. 
Looking at the code for MR v/s sequential in TestNaiveBayes they both seem to be calling the respective Standard/Complimentary Naive Bayes
 classifiers.

I guess we need to look at CNB calculations more closely and see if its doing the right thing.

On Friday, March 28, 2014 5:09 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
 
Ok, then I should remove it? There's about 2 dozen lines of code in TestNaiveBayesDriver for running sequentially.


-----Original Message-----
From: Suneel Marthi [mailto:suneel_marthi@yahoo.com] 
Sent: Friday, March 28, 2014 3:51 PM
To: dev@mahout.apache.org
Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Bayes doesn't have a non-mapreduce impl so -seq flag wouldn't  work. 

Sent from my
 iPhone

> On Mar 28,
 2014, at 4:16 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
> 
> Well, maybe someone can correct me but this seems disappointing. I uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, added some trace statements in ComplementaryThetaMapper and ComplementaryNaiveBayesClassifier to verify they were being called, and then ran some tests using trainnb/testnb. There was not a single difference in the classifications when train/testcomplementary was specified vs standard naïve bayes.
> 
> Also, running testnb with the -seq flag doesn't appear to work.
> 
> -----Original Message-----
> From: Chandler Burgess [mailto:cburgess@icontrolesi.com]
> Sent: Thursday, March 27, 2014 5:17 PM
> To: dev@mahout.apache.org
> Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> 
> The program I wrote didn't use a model that was trained with Cbayes. After looking at the scorers in SNB and CNB, I figured they would give different results even on a model not trained with CNB. That could very well be ignorance on my part as to the math. 
> 
> However, I did some command line tests using -c on both training and testing and didn't see any difference in the testnb output.
> ________________________________________
> From: Suneel Marthi <su...@yahoo.com>
> Sent: Thursday, March 27, 2014 5:12 PM
> To: dev@mahout.apache.org
> Cc: ssc@apache.org
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> 
> Just checking , u r testing Cbayes on a model that's already been trained using Cbayes correct?
> 
> Also the jira I
 mentioned earlier was fixed for .9, so u should be 
> good. No code changes were done to naive bayes since .9
> 
> 
> Sent from my iPhone
> 
>> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>> 
>> Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.
>> 
>> Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.
>> 
>> Thanks,
>> Chandler
>> ________________________________________
>> From: Sebastian Schelter <ss...@apache.org>
>> Sent: Thursday, March 27, 2014 4:01 PM
>> To: dev@mahout.apache.org
>> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>> 
>> Hi Chandler,
>> 
>> I think a good way to go would be to reenable theta normalization and 
>> run the classification examples that we already have to see how it 
>> affects the
 result (and make sure it improves the result).
>> 
>> Would be great to have this fixed. I'm also planning to port NB to 
>> our Spark DSL very soon (should be just a few lines of code).
>> 
>>
 --sebastian
>> 
>> 
>>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
>>> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>>> 
>>> Please test with Mahout 0.9 or trunk.
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> It seems Robin Anil
 hasn't responded, and no one is
 sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>>> 
>>> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>>> 
>>> Thanks,
>>> Chandler Burgess
>> 

Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Suneel Marthi <su...@yahoo.com>.
I was alluring to TrainNaiveBayesJob which is MR only.  U r right TestNaiveBayesDriver has both MR and sequential. 
Looking at the code for MR v/s sequential in TestNaiveBayes they both seem to be calling the respective Standard/Complimentary Naive Bayes classifiers.

I guess we need to look at CNB calculations more closely and see if its doing the right thing.

On Friday, March 28, 2014 5:09 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
 
Ok, then I should remove it? There's about 2 dozen lines of code in TestNaiveBayesDriver for running sequentially.


-----Original Message-----
From: Suneel Marthi [mailto:suneel_marthi@yahoo.com] 
Sent: Friday, March 28, 2014 3:51 PM
To: dev@mahout.apache.org
Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Bayes doesn't have a non-mapreduce impl so -seq flag wouldn't  work. 

Sent from my iPhone

> On Mar 28,
 2014, at 4:16 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
> 
> Well, maybe someone can correct me but this seems disappointing. I uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, added some trace statements in ComplementaryThetaMapper and ComplementaryNaiveBayesClassifier to verify they were being called, and then ran some tests using trainnb/testnb. There was not a single difference in the classifications when train/testcomplementary was specified vs standard naïve bayes.
> 
> Also, running testnb with the -seq flag doesn't appear to work.
> 
> -----Original Message-----
> From: Chandler Burgess [mailto:cburgess@icontrolesi.com]
> Sent: Thursday, March 27, 2014 5:17 PM
> To: dev@mahout.apache.org
> Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> 
> The program I wrote didn't use a model that was trained with Cbayes. After looking at the scorers in SNB and CNB, I figured they would give different results even on a model not trained with CNB. That could very well be ignorance on my part as to the math. 
> 
> However, I did some command line tests using -c on both training and testing and didn't see any difference in the testnb output.
> ________________________________________
> From: Suneel Marthi <su...@yahoo.com>
> Sent: Thursday, March 27, 2014 5:12 PM
> To: dev@mahout.apache.org
> Cc: ssc@apache.org
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> 
> Just checking , u r testing Cbayes on a model that's already been trained using Cbayes correct?
> 
> Also the jira I
 mentioned earlier was fixed for .9, so u should be 
> good. No code changes were done to naive bayes since .9
> 
> 
> Sent from my iPhone
> 
>> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>> 
>> Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.
>> 
>> Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.
>> 
>> Thanks,
>> Chandler
>> ________________________________________
>> From: Sebastian Schelter <ss...@apache.org>
>> Sent: Thursday, March 27, 2014 4:01 PM
>> To: dev@mahout.apache.org
>> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>> 
>> Hi Chandler,
>> 
>> I think a good way to go would be to reenable theta normalization and 
>> run the classification examples that we already have to see how it 
>> affects the
 result (and make sure it improves the result).
>> 
>> Would be great to have this fixed. I'm also planning to port NB to 
>> our Spark DSL very soon (should be just a few lines of code).
>> 
>>
 --sebastian
>> 
>> 
>>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
>>> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>>> 
>>> Please test with Mahout 0.9 or trunk.
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> It seems Robin Anil
 hasn't responded, and no one is
 sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>>> 
>>> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>>> 
>>> Thanks,
>>> Chandler Burgess
>> 

RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Chandler Burgess <cb...@icontrolesi.com>.
Ok, then I should remove it? There's about 2 dozen lines of code in TestNaiveBayesDriver for running sequentially.

-----Original Message-----
From: Suneel Marthi [mailto:suneel_marthi@yahoo.com] 
Sent: Friday, March 28, 2014 3:51 PM
To: dev@mahout.apache.org
Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Bayes doesn't have a non-mapreduce impl so -seq flag wouldn't  work. 

Sent from my iPhone

> On Mar 28, 2014, at 4:16 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
> 
> Well, maybe someone can correct me but this seems disappointing. I uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, added some trace statements in ComplementaryThetaMapper and ComplementaryNaiveBayesClassifier to verify they were being called, and then ran some tests using trainnb/testnb. There was not a single difference in the classifications when train/testcomplementary was specified vs standard naïve bayes.
> 
> Also, running testnb with the -seq flag doesn't appear to work.
> 
> -----Original Message-----
> From: Chandler Burgess [mailto:cburgess@icontrolesi.com]
> Sent: Thursday, March 27, 2014 5:17 PM
> To: dev@mahout.apache.org
> Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> 
> The program I wrote didn't use a model that was trained with Cbayes. After looking at the scorers in SNB and CNB, I figured they would give different results even on a model not trained with CNB. That could very well be ignorance on my part as to the math. 
> 
> However, I did some command line tests using -c on both training and testing and didn't see any difference in the testnb output.
> ________________________________________
> From: Suneel Marthi <su...@yahoo.com>
> Sent: Thursday, March 27, 2014 5:12 PM
> To: dev@mahout.apache.org
> Cc: ssc@apache.org
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> 
> Just checking , u r testing Cbayes on a model that's already been trained using Cbayes correct?
> 
> Also the jira I mentioned earlier was fixed for .9, so u should be 
> good. No code changes were done to naive bayes since .9
> 
> 
> Sent from my iPhone
> 
>> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>> 
>> Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.
>> 
>> Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.
>> 
>> Thanks,
>> Chandler
>> ________________________________________
>> From: Sebastian Schelter <ss...@apache.org>
>> Sent: Thursday, March 27, 2014 4:01 PM
>> To: dev@mahout.apache.org
>> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>> 
>> Hi Chandler,
>> 
>> I think a good way to go would be to reenable theta normalization and 
>> run the classification examples that we already have to see how it 
>> affects the result (and make sure it improves the result).
>> 
>> Would be great to have this fixed. I'm also planning to port NB to 
>> our Spark DSL very soon (should be just a few lines of code).
>> 
>> --sebastian
>> 
>> 
>>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
>>> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>>> 
>>> Please test with Mahout 0.9 or trunk.
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>>> 
>>> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>>> 
>>> Thanks,
>>> Chandler Burgess
>> 

Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Suneel Marthi <su...@yahoo.com>.
Bayes doesn't have a non-mapreduce impl so -seq flag wouldn't  work. 

Sent from my iPhone

> On Mar 28, 2014, at 4:16 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
> 
> Well, maybe someone can correct me but this seems disappointing. I uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, added some trace statements in ComplementaryThetaMapper and ComplementaryNaiveBayesClassifier to verify they were being called, and then ran some tests using trainnb/testnb. There was not a single difference in the classifications when train/testcomplementary was specified vs standard naïve bayes.
> 
> Also, running testnb with the -seq flag doesn't appear to work.
> 
> -----Original Message-----
> From: Chandler Burgess [mailto:cburgess@icontrolesi.com] 
> Sent: Thursday, March 27, 2014 5:17 PM
> To: dev@mahout.apache.org
> Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> 
> The program I wrote didn't use a model that was trained with Cbayes. After looking at the scorers in SNB and CNB, I figured they would give different results even on a model not trained with CNB. That could very well be ignorance on my part as to the math. 
> 
> However, I did some command line tests using -c on both training and testing and didn't see any difference in the testnb output.
> ________________________________________
> From: Suneel Marthi <su...@yahoo.com>
> Sent: Thursday, March 27, 2014 5:12 PM
> To: dev@mahout.apache.org
> Cc: ssc@apache.org
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> 
> Just checking , u r testing Cbayes on a model that's already been trained using Cbayes correct?
> 
> Also the jira I mentioned earlier was fixed for .9, so u should be good. No code changes were done to naive bayes since .9
> 
> 
> Sent from my iPhone
> 
>> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>> 
>> Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.
>> 
>> Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.
>> 
>> Thanks,
>> Chandler
>> ________________________________________
>> From: Sebastian Schelter <ss...@apache.org>
>> Sent: Thursday, March 27, 2014 4:01 PM
>> To: dev@mahout.apache.org
>> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>> 
>> Hi Chandler,
>> 
>> I think a good way to go would be to reenable theta normalization and 
>> run the classification examples that we already have to see how it 
>> affects the result (and make sure it improves the result).
>> 
>> Would be great to have this fixed. I'm also planning to port NB to our 
>> Spark DSL very soon (should be just a few lines of code).
>> 
>> --sebastian
>> 
>> 
>>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
>>> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>>> 
>>> Please test with Mahout 0.9 or trunk.
>>> 
>>> 
>>> 
>>> 
>>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>>> 
>>> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>>> 
>>> Thanks,
>>> Chandler Burgess
>> 

RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Chandler Burgess <cb...@icontrolesi.com>.
Well, maybe someone can correct me but this seems disappointing. I uncommented the code in NaiveBayesModel, BayesUtil and TrainNaiveBayesJob, added some trace statements in ComplementaryThetaMapper and ComplementaryNaiveBayesClassifier to verify they were being called, and then ran some tests using trainnb/testnb. There was not a single difference in the classifications when train/testcomplementary was specified vs standard naïve bayes.

Also, running testnb with the -seq flag doesn't appear to work.

-----Original Message-----
From: Chandler Burgess [mailto:cburgess@icontrolesi.com] 
Sent: Thursday, March 27, 2014 5:17 PM
To: dev@mahout.apache.org
Subject: RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

The program I wrote didn't use a model that was trained with Cbayes. After looking at the scorers in SNB and CNB, I figured they would give different results even on a model not trained with CNB. That could very well be ignorance on my part as to the math. 

However, I did some command line tests using -c on both training and testing and didn't see any difference in the testnb output.
________________________________________
From: Suneel Marthi <su...@yahoo.com>
Sent: Thursday, March 27, 2014 5:12 PM
To: dev@mahout.apache.org
Cc: ssc@apache.org
Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Just checking , u r testing Cbayes on a model that's already been trained using Cbayes correct?

Also the jira I mentioned earlier was fixed for .9, so u should be good. No code changes were done to naive bayes since .9


Sent from my iPhone

> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>
> Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.
>
> Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.
>
> Thanks,
> Chandler
> ________________________________________
> From: Sebastian Schelter <ss...@apache.org>
> Sent: Thursday, March 27, 2014 4:01 PM
> To: dev@mahout.apache.org
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>
> Hi Chandler,
>
> I think a good way to go would be to reenable theta normalization and 
> run the classification examples that we already have to see how it 
> affects the result (and make sure it improves the result).
>
> Would be great to have this fixed. I'm also planning to port NB to our 
> Spark DSL very soon (should be just a few lines of code).
>
> --sebastian
>
>
>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
>> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>>
>> Please test with Mahout 0.9 or trunk.
>>
>>
>>
>>
>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>>
>> Hello all,
>>
>> It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>>
>> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>>
>> Thanks,
>> Chandler Burgess
>

RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Chandler Burgess <cb...@icontrolesi.com>.
The program I wrote didn't use a model that was trained with Cbayes. After looking at the scorers in SNB and CNB, I figured they would give different results even on a model not trained with CNB. That could very well be ignorance on my part as to the math. 

However, I did some command line tests using -c on both training and testing and didn't see any difference in the testnb output.
________________________________________
From: Suneel Marthi <su...@yahoo.com>
Sent: Thursday, March 27, 2014 5:12 PM
To: dev@mahout.apache.org
Cc: ssc@apache.org
Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Just checking , u r testing Cbayes on a model that's already been trained using Cbayes correct?

Also the jira I mentioned earlier was fixed for .9, so u should be good. No code changes were done to naive bayes since .9


Sent from my iPhone

> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>
> Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.
>
> Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.
>
> Thanks,
> Chandler
> ________________________________________
> From: Sebastian Schelter <ss...@apache.org>
> Sent: Thursday, March 27, 2014 4:01 PM
> To: dev@mahout.apache.org
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
>
> Hi Chandler,
>
> I think a good way to go would be to reenable theta normalization and
> run the classification examples that we already have to see how it
> affects the result (and make sure it improves the result).
>
> Would be great to have this fixed. I'm also planning to port NB to our
> Spark DSL very soon (should be just a few lines of code).
>
> --sebastian
>
>
>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
>> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>>
>> Please test with Mahout 0.9 or trunk.
>>
>>
>>
>>
>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>>
>> Hello all,
>>
>> It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>>
>> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>>
>> Thanks,
>> Chandler Burgess
>

Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Suneel Marthi <su...@yahoo.com>.
Just checking , u r testing Cbayes on a model that's already been trained using Cbayes correct?

Also the jira I mentioned earlier was fixed for .9, so u should be good. No code changes were done to naive bayes since .9


Sent from my iPhone

> On Mar 27, 2014, at 6:01 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
> 
> Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.
> 
> Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.
> 
> Thanks,
> Chandler
> ________________________________________
> From: Sebastian Schelter <ss...@apache.org>
> Sent: Thursday, March 27, 2014 4:01 PM
> To: dev@mahout.apache.org
> Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?
> 
> Hi Chandler,
> 
> I think a good way to go would be to reenable theta normalization and
> run the classification examples that we already have to see how it
> affects the result (and make sure it improves the result).
> 
> Would be great to have this fixed. I'm also planning to port NB to our
> Spark DSL very soon (should be just a few lines of code).
> 
> --sebastian
> 
> 
>> On 03/27/2014 09:07 PM, Suneel Marthi wrote:
>> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>> 
>> Please test with Mahout 0.9 or trunk.
>> 
>> 
>> 
>> 
>> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>> 
>> Hello all,
>> 
>> It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>> 
>> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>> 
>> Thanks,
>> Chandler Burgess
> 

RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Chandler Burgess <cb...@icontrolesi.com>.
Ok, I'll uncomment those lines and see. I also have plenty of test data available  too (I'm doing document classification with unbalanced classes), so I'll see if it improves there as well.

Also, I'll try to make some time in the next week and go over the algorithm in detail compared with the paper as an extra check.

Thanks,
Chandler
________________________________________
From: Sebastian Schelter <ss...@apache.org>
Sent: Thursday, March 27, 2014 4:01 PM
To: dev@mahout.apache.org
Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Hi Chandler,

I think a good way to go would be to reenable theta normalization and
run the classification examples that we already have to see how it
affects the result (and make sure it improves the result).

Would be great to have this fixed. I'm also planning to port NB to our
Spark DSL very soon (should be just a few lines of code).

--sebastian


On 03/27/2014 09:07 PM, Suneel Marthi wrote:
> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>
> Please test with Mahout 0.9 or trunk.
>
>
>
>
> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>
> Hello all,
>
> It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>
> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>
> Thanks,
> Chandler Burgess
>


Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Chandler,

I think a good way to go would be to reenable theta normalization and 
run the classification examples that we already have to see how it 
affects the result (and make sure it improves the result).

Would be great to have this fixed. I'm also planning to port NB to our 
Spark DSL very soon (should be just a few lines of code).

--sebastian


On 03/27/2014 09:07 PM, Suneel Marthi wrote:
> Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.
>
> Please test with Mahout 0.9 or trunk.
>
>
>
>
> On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
>
> Hello all,
>
> It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.
>
> I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.
>
> Thanks,
> Chandler Burgess
>


RE: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Chandler Burgess <cb...@icontrolesi.com>.
Hi Suneel,

I'm using 0.9. I did not train using Complementary NB, but was only using it for testing. I'm not real familiar with the math but can see CNBClassifier is scoring differently than SNBClassifier, so I thought I would see something, but the scores and results from testnb didn't change. 

I'll get the trunk version and see if it fixes that part at least.
________________________________________
From: Suneel Marthi <su...@yahoo.com>
Sent: Thursday, March 27, 2014 3:07 PM
To: dev@mahout.apache.org
Subject: Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.

Please test with Mahout 0.9 or trunk.




On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:

Hello all,

It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.

I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.

Thanks,
Chandler Burgess

Re: MAHOUT-1369 - Why does theta normalization for naive bayes classification commented out?

Posted by Suneel Marthi <su...@yahoo.com>.
Which Mahout version r u running? While its true that ThetaNormalizer is still disabled today, Mahout-1389 fixes a bug wherein Complementary NB wasn't being called when invoked.

Please test with Mahout 0.9 or trunk.




On Thursday, March 27, 2014 3:53 PM, Chandler Burgess <cb...@icontrolesi.com> wrote:
 
Hello all,

It seems Robin Anil hasn't responded, and no one is sure of the status on this. What needs to be done on this, and/or what can I do to help? I'm no ML expert, but I do have the paper and should be able to verify/fix the implementation. I'm REALLY interested in using the CNB classifier, since it seems well suited to the problem I'm trying to tackle, before I give up and use something else.

I've done tests and see no difference when -c is passed on the command line for training or testing. I also wrote a program to print the scores using StandardNaiveBayesClassifier and ComplementaryNaiveBayesClassifier in a binary classification problem and see no difference between the scores, so it seems complementary naïve bayes is completely disabled.

Thanks,
Chandler Burgess