You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Cohan Sujay Carlos (JIRA)" <ji...@apache.org> on 2015/12/28 17:23:49 UTC

[jira] [Updated] (OPENNLP-777) Naive Bayesian Classifier

     [ https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cohan Sujay Carlos updated OPENNLP-777:
---------------------------------------
    Attachment: naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch

[~joern] and [~teofili],

I am submitting herewith another patch with the fixes requested by Joern.

In this patch, I have made the following changes:

a)  Removed the DocumentCategorizerNB file.
b)  Rewritten the test-case for the above to operate on DocumentCategorizerME (passing suitable parameters to train() to exercise the NB classifier instead of the ME classifier).
c)  Changed NaiveBayesModel in ml.naivebayes to remove the flag to disable smoothing (I've removed the flag since there is no use-case where the NB classifier would be used without smoothing).
d)  Changed the NaiveBayesCorrectnessTest to reflect the above.
e)  Made small changes to Tommaso's test-case "NaiveBayesModelReadWriteTest" because it was causing the tests to fail when executed on the Maven Eclipse plugin on Windows.  I changed the location of the temp file so that the tests no longer fail on Windows.  Could [~teofili] run this testcase on Unix to verify that it works fine.

(This patch is to be applied to the trunk).

I suppose I may have undone the formatting that [~teofili] corrected on the above files in making these changes.

I will need [~joern] or [~teofili] to check this patch in for me if all is well.

> Naive Bayesian Classifier
> -------------------------
>
>                 Key: OPENNLP-777
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-777
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Machine Learning
>         Environment: J2SE 1.5 and above
>            Reporter: Cohan Sujay Carlos
>            Assignee: Tommaso Teofili
>            Priority: Minor
>              Labels: NBClassifier, bayes, bayesian, classifier, multinomial, naive, patch
>         Attachments: D1TopicClassifierTrainingDemoNB.java, D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, naive-bayes-classifier-2-adding-fixes-requested-by-joern-on-20-oct-2015.patch, naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch, topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for a multinomial Naive Bayesian classifier (with default Laplace smoothing) that we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write an adapter to make the interface compatible with the ME classifier in OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this dated May 19th, 2015.
> <snip>
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> </snip>
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)