You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by kottmann <gi...@git.apache.org> on 2017/06/27 08:23:20 UTC

[GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

GitHub user kottmann opened a pull request:

    https://github.com/apache/opennlp/pull/238

    Revert merging of sentiment work, no consent to merge it

    Thank you for contributing to Apache OpenNLP.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    - [ ] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kottmann/opennlp revert_sentiment

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/opennlp/pull/238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #238
    
----
commit 123222eb34724bae793e9d6d22e202c0aee0aa45
Author: Jörn Kottmann <jo...@apache.org>
Date:   2017-06-27T08:19:19Z

    Revert merging of sentiment work, no consent to merge it

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by kottmann <gi...@git.apache.org>.
Github user kottmann closed the pull request at:

    https://github.com/apache/opennlp/pull/238


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Chris Mattmann <ma...@apache.org>.
Absolutely reproducibility is important and I agree. 

Thanks,
Chris




On 6/29/17, 10:38 AM, "Joern Kottmann" <ko...@gmail.com> wrote:

    One more thing, in case we check in models for unit tests we need to
    be able to train them again, we might not support those models forever
    and then it would be bad if we can't use the tests anymore or need to
    repair them by hand.
    
    Jörn
    
    On Thu, Jun 29, 2017 at 7:18 PM, Joern Kottmann <ko...@gmail.com> wrote:
    > For 2. I would like to suggest that we implement doccat format support
    > to train on that data.
    >
    > 3. it would be best so think about how we want to test the doccat
    > component, today we don't have any tests which use lots of data to
    > evaluate it.
    > Probably the sentitment data could solve this for us and a train and
    > evaluate test could be included in the eval tests.
    >
    > +1 to revert and then do these steps after the 1.8.1 release.
    >
    > I can apply my PR myself if nobody objects.
    >
    > Jörn
    >
    > On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <ma...@apache.org> wrote:
    >> Hi Rodrigo,
    >>
    >> This is very useful feedback that I wish we would have had a long time ago.
    >>
    >> I will look into it and see if I can reproduce the CLI error. I did a full build and mvn
    >> install (which I though would run tests?) before commiting and as I posted in JIRA
    >> the tests passed for me? So I will have to look into that.
    >>
    >> That said, given your feedback that SentimentME and the Sentiment Component
    >> doesn’t offer much over Document Classifier I agree with you, but wasn’t super
    >> familiar with the Document Classifier API. That said, if we can get the same functionality
    >> by just using Document Classifier why don’t we:
    >>
    >> 1. Remove the SentimentME and associated code (except for the unit tests)
    >> 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
    >> build models using Document Classifier API.
    >> 3. Rename and keep the unit tests that test against Netflix and Stanford tree bank.
    >>
    >> That way we get basic sentiment analysis (that is working for us internally at JPL decently),
    >> for Apache OpenNLP, and then if we want to build something better than a Document
    >> Classification approach to sentiment we can do so.
    >>
    >> Thoughts?
    >>
    >> Thanks for your useful feedback. If everyone agrees this is a plan I can back out the code
    >> using Joern’s revert, and then try and execute 1-3 above in a branch first. Thanks.
    >>
    >> Cheers,
    >> Chris
    >>
    >>
    >>
    >> On 6/29/17, 10:03 AM, "Rodrigo Agerri" <ra...@apache.org> wrote:
    >>
    >>     Hi Chris,
    >>
    >>     I have been interested in the new sentiment component for a while,
    >>     although truth to be told, I did not follow that closely. I have today
    >>     looked at it and test it with some of the corpora you have mentioned.
    >>     In order to do that, I checkout master to work with from this commit
    >>     onwards
    >>
    >>     https://github.com/apache/opennlp/commit/56321aab51a470cd2004b76fb1f5330881b943c1
    >>
    >>     1. I tried to run it from the CLI. The Sentiment component did not
    >>     appear to be available.
    >>     2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no
    >>     SentimentTool is implemented to tag with a trained model).
    >>     3. After that, the CLI tests did not pass. So, the CLI is currently
    >>     non functional, unless I did something wrong, always possible, of
    >>     course. See if you can reproduce that error.
    >>
    >>     I therefore did the tests via API. I implemented a little test for
    >>     training, evaluating and tagging here:
    >>
    >>     https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
    >>
    >>     I run the training on the large movies review from Stanford for binary
    >>     polarity classification
    >>
    >>     http://ai.stanford.edu/~amaas/data/sentiment/
    >>
    >>     and on the two little samples multiclass files added in resources and
    >>     mentioned in the previous email, using the first one for training and
    >>     the second one for testing (maxent 100 iterations, cutoff 5).
    >>
    >>     2. Stanford results: 0.84264
    >>     3. sample multiclass: 0.73
    >>
    >>     Given that this is a standard document classification task, I decided
    >>     to train the doccat component from the CLI:
    >>
    >>     1. Stanford results: 0.84264 (BOW features by default).
    >>     2. sample multiclass: 0.73
    >>
    >>     I then looked at the code of the sentiment component and saw that it
    >>     is basically a document classifier working with bag of words features.
    >>     No added functionality. So, my conclusions are:
    >>
    >>     1. The CLI needs to be fixed.
    >>     2. The Sentiment component, as it is, provides the same functionality
    >>     as the document classifier.
    >>
    >>     I would therefore reconsider this commit until those two issues are
    >>     addressed. Just my opinion.
    >>
    >>     Best regards,
    >>
    >>     Rodrigo
    >>
    >>     On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <ma...@apache.org> wrote:
    >>     >
    >>     > Hey Joern,
    >>     >
    >>     > Sure, you can find the model data links here, along with our evaluation of them.
    >>     >
    >>     > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
    >>     >
    >>     > There are other evaluations here:
    >>     >
    >>     > http://irds.usc.edu/SentimentAnalysisParser/models.html
    >>     >
    >>     > The HT provider review I cannot contribute at this time and I question its broad
    >>     > applicability since it’s related to human trafficking. In addition we are still working
    >>     > on publishing our analysis & evaluation of it which is why I removed it from the
    >>     > commit.
    >>     >
    >>     > Cheers,
    >>     > Chris
    >>     >
    >>     >
    >>     >
    >>     >
    >>     >
    >>     > On 6/29/17, 7:36 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
    >>     >
    >>     >     Which data sets did you use to evaluate this?
    >>     >     I was looking for a bit more than a sample file to train it.
    >>     >
    >>     >     I noticed that you checked in stanford and netflix models.
    >>     >
    >>     >     The stanford data set is probably this one:
    >>     >     http://ai.stanford.edu/~amaas/data/sentiment/
    >>     >
    >>     >     Do you have a link for the netflix data?
    >>     >
    >>     >     Jörn
    >>     >
    >>     >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <ma...@apache.org> wrote:
    >>     >     > Absolutely you can find it here:
    >>     >     >
    >>     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ (for categorical /multi-class)
    >>     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 (for categorical/multi-class)
    >>     >     >
    >>     >     > We can also do similar files where instead of multi-class, we just use pos/neg as the label.
    >>     >     >
    >>     >     > Cheers,
    >>     >     > Chris
    >>     >     >
    >>     >     >
    >>     >     >
    >>     >     >
    >>     >     >
    >>     >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
    >>     >     >
    >>     >     >     Hello Chris,
    >>     >     >
    >>     >     >     could you please point me to files I can use to train the sentiment
    >>     >     >     component? I am currently looking again through the code and would
    >>     >     >     like to train it myself.
    >>     >     >
    >>     >     >     Jörn
    >>     >     >
    >>     >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <da...@gmail.com> wrote:
    >>     >     >     > Hi All,
    >>     >     >     >    First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
    >>     >     >     >
    >>     >     >     >   Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
    >>     >     >     >
    >>     >     >     >   Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
    >>     >     >     >
    >>     >     >     > Thank you,
    >>     >     >     > Daniel
    >>     >     >     >
    >>     >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
    >>     >     >     >>
    >>     >     >     >> Hi everyone,
    >>     >     >     >>
    >>     >     >     >> I spoke with Joern in Slack. Some of his concerns are:
    >>     >     >     >>
    >>     >     >     >> 1. This was done with a Merge commit and apparently they squash and rebase.
    >>     >     >     >> [would be helpful to see some pointer on this for documentation, thus far I
    >>     >     >     >> haven’t found any]
    >>     >     >     >> 2. Apparently we literally need to ask others for +1 votes and record them
    >>     >     >     >> before committing? I thought since Ana and I are committers aren were +1,
    >>     >     >     >> and since Joern had been providing feedback (the last of which was to add
    >>     >     >     >> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
    >>     >     >     >> formally we need to do a +1 vote even still)
    >>     >     >     >> 3. There was concern about scalability of the code.
    >>     >     >     >> 4. There are thoughts that the code was not perfect yet (even though it works
    >>     >     >     >> fine in the MEMEX project for Ana and I)
    >>     >     >     >>
    >>     >     >     >> So, Joern has opened up a revert PR.
    >>     >     >     >>
    >>     >     >     >> I suppose I should state I find this process extremely heavyweight and unwelcoming.
    >>     >     >     >> To me, there should be a modicum of trust for committers, but I feel like even as a
    >>     >     >     >> committer, I am operating as a “contributor” to the project. Committer means that
    >>     >     >     >> there is trust to modify the source code base. Of the issues above, the only one I see
    >>     >     >     >> as a moderate snafu was #1, and frankly if there are some instructions that show me
    >>     >     >     >> how to do squashing and rebasing *first* I will try to do that in the future since I am
    >>     >     >     >> not a GIt expert.
    >>     >     >     >>
    >>     >     >     >> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC
    >>     >     >     >> effort, and we have worked pretty consistently on this over the last year. We used a
    >>     >     >     >> separate GitHub project to get started, kept Joern involved as another mentor, even
    >>     >     >     >> provided access and commit writes to that GitHub repository for a long time, so this
    >>     >     >     >> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
    >>     >     >     >> I should have gone and worked on that branch first since master is apparently so
    >>     >     >     >> pristine that even an Apache veteran like me can’t get something in to it without
    >>     >     >     >> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
    >>     >     >     >> “community” issues).
    >>     >     >     >>
    >>     >     >     >> I am concerned from a community point of view that the first comment wasn’t “Great
    >>     >     >     >> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
    >>     >     >     >> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
    >>     >     >     >>
    >>     >     >     >> That’s pretty off-putting to someone who is semi-new like me and like Ana.
    >>     >     >     >>
    >>     >     >     >> Anyways, go ahead and revert it. Sorry to have caused any issues.
    >>     >     >     >>
    >>     >     >     >> Chris
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
    >>     >     >     >>
    >>     >     >     >>    Hi Joern,
    >>     >     >     >>
    >>     >     >     >>    I’m confused. Why did you revert my commit?
    >>     >     >     >>
    >>     >     >     >>    Every one of those check points you put on the PR was checked?
    >>     >     >     >>    We have been discussing this for months, you have seen the
    >>     >     >     >>    code for months, Ana and I have worked diligently on the code
    >>     >     >     >>    in plain view of everyone.
    >>     >     >     >>
    >>     >     >     >>    Please explain.
    >>     >     >     >>
    >>     >     >     >>    Chris
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
    >>     >     >     >>
    >>     >     >     >>        GitHub user kottmann opened a pull request:
    >>     >     >     >>
    >>     >     >     >>            https://github.com/apache/opennlp/pull/238
    >>     >     >     >>
    >>     >     >     >>            Revert merging of sentiment work, no consent to merge it
    >>     >     >     >>
    >>     >     >     >>            Thank you for contributing to Apache OpenNLP.
    >>     >     >     >>
    >>     >     >     >>            In order to streamline the review of the contribution we ask you
    >>     >     >     >>            to ensure the following steps have been taken:
    >>     >     >     >>
    >>     >     >     >>            ### For all changes:
    >>     >     >     >>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
    >>     >     >     >>                 in the commit message?
    >>     >     >     >>
    >>     >     >     >>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    >>     >     >     >>
    >>     >     >     >>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    >>     >     >     >>
    >>     >     >     >>            - [ ] Is your initial contribution a single, squashed commit?
    >>     >     >     >>
    >>     >     >     >>            ### For code changes:
    >>     >     >     >>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    >>     >     >     >>            - [ ] Have you written or updated unit tests to verify your changes?
    >>     >     >     >>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    >>     >     >     >>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    >>     >     >     >>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
    >>     >     >     >>
    >>     >     >     >>            ### For documentation related changes:
    >>     >     >     >>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    >>     >     >     >>
    >>     >     >     >>            ### Note:
    >>     >     >     >>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >>        You can merge this pull request into a Git repository by running:
    >>     >     >     >>
    >>     >     >     >>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
    >>     >     >     >>
    >>     >     >     >>        Alternatively you can review and apply these changes as the patch at:
    >>     >     >     >>
    >>     >     >     >>            https://github.com/apache/opennlp/pull/238.patch
    >>     >     >     >>
    >>     >     >     >>        To close this pull request, make a commit to your master/trunk branch
    >>     >     >     >>        with (at least) the following in the commit message:
    >>     >     >     >>
    >>     >     >     >>            This closes #238
    >>     >     >     >>
    >>     >     >     >>        ----
    >>     >     >     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
    >>     >     >     >>        Author: Jörn Kottmann <jo...@apache.org>
    >>     >     >     >>        Date:   2017-06-27T08:19:19Z
    >>     >     >     >>
    >>     >     >     >>            Revert merging of sentiment work, no consent to merge it
    >>     >     >     >>
    >>     >     >     >>        ----
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >>        ---
    >>     >     >     >>        If your project is set up for it, you can reply to this email and have your
    >>     >     >     >>        reply appear on GitHub as well. If your project does not have this feature
    >>     >     >     >>        enabled and wishes so, or if the feature is enabled but not working, please
    >>     >     >     >>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
    >>     >     >     >>        with INFRA.
    >>     >     >     >>        ---
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >>
    >>     >     >     >
    >>     >     >
    >>     >     >
    >>     >     >
    >>     >
    >>     >
    >>     >
    >>
    >>
    >>
    



Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Joern Kottmann <ko...@gmail.com>.
One more thing, in case we check in models for unit tests we need to
be able to train them again, we might not support those models forever
and then it would be bad if we can't use the tests anymore or need to
repair them by hand.

Jörn

On Thu, Jun 29, 2017 at 7:18 PM, Joern Kottmann <ko...@gmail.com> wrote:
> For 2. I would like to suggest that we implement doccat format support
> to train on that data.
>
> 3. it would be best so think about how we want to test the doccat
> component, today we don't have any tests which use lots of data to
> evaluate it.
> Probably the sentitment data could solve this for us and a train and
> evaluate test could be included in the eval tests.
>
> +1 to revert and then do these steps after the 1.8.1 release.
>
> I can apply my PR myself if nobody objects.
>
> Jörn
>
> On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <ma...@apache.org> wrote:
>> Hi Rodrigo,
>>
>> This is very useful feedback that I wish we would have had a long time ago.
>>
>> I will look into it and see if I can reproduce the CLI error. I did a full build and mvn
>> install (which I though would run tests?) before commiting and as I posted in JIRA
>> the tests passed for me? So I will have to look into that.
>>
>> That said, given your feedback that SentimentME and the Sentiment Component
>> doesn’t offer much over Document Classifier I agree with you, but wasn’t super
>> familiar with the Document Classifier API. That said, if we can get the same functionality
>> by just using Document Classifier why don’t we:
>>
>> 1. Remove the SentimentME and associated code (except for the unit tests)
>> 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
>> build models using Document Classifier API.
>> 3. Rename and keep the unit tests that test against Netflix and Stanford tree bank.
>>
>> That way we get basic sentiment analysis (that is working for us internally at JPL decently),
>> for Apache OpenNLP, and then if we want to build something better than a Document
>> Classification approach to sentiment we can do so.
>>
>> Thoughts?
>>
>> Thanks for your useful feedback. If everyone agrees this is a plan I can back out the code
>> using Joern’s revert, and then try and execute 1-3 above in a branch first. Thanks.
>>
>> Cheers,
>> Chris
>>
>>
>>
>> On 6/29/17, 10:03 AM, "Rodrigo Agerri" <ra...@apache.org> wrote:
>>
>>     Hi Chris,
>>
>>     I have been interested in the new sentiment component for a while,
>>     although truth to be told, I did not follow that closely. I have today
>>     looked at it and test it with some of the corpora you have mentioned.
>>     In order to do that, I checkout master to work with from this commit
>>     onwards
>>
>>     https://github.com/apache/opennlp/commit/56321aab51a470cd2004b76fb1f5330881b943c1
>>
>>     1. I tried to run it from the CLI. The Sentiment component did not
>>     appear to be available.
>>     2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no
>>     SentimentTool is implemented to tag with a trained model).
>>     3. After that, the CLI tests did not pass. So, the CLI is currently
>>     non functional, unless I did something wrong, always possible, of
>>     course. See if you can reproduce that error.
>>
>>     I therefore did the tests via API. I implemented a little test for
>>     training, evaluating and tagging here:
>>
>>     https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
>>
>>     I run the training on the large movies review from Stanford for binary
>>     polarity classification
>>
>>     http://ai.stanford.edu/~amaas/data/sentiment/
>>
>>     and on the two little samples multiclass files added in resources and
>>     mentioned in the previous email, using the first one for training and
>>     the second one for testing (maxent 100 iterations, cutoff 5).
>>
>>     2. Stanford results: 0.84264
>>     3. sample multiclass: 0.73
>>
>>     Given that this is a standard document classification task, I decided
>>     to train the doccat component from the CLI:
>>
>>     1. Stanford results: 0.84264 (BOW features by default).
>>     2. sample multiclass: 0.73
>>
>>     I then looked at the code of the sentiment component and saw that it
>>     is basically a document classifier working with bag of words features.
>>     No added functionality. So, my conclusions are:
>>
>>     1. The CLI needs to be fixed.
>>     2. The Sentiment component, as it is, provides the same functionality
>>     as the document classifier.
>>
>>     I would therefore reconsider this commit until those two issues are
>>     addressed. Just my opinion.
>>
>>     Best regards,
>>
>>     Rodrigo
>>
>>     On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <ma...@apache.org> wrote:
>>     >
>>     > Hey Joern,
>>     >
>>     > Sure, you can find the model data links here, along with our evaluation of them.
>>     >
>>     > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
>>     >
>>     > There are other evaluations here:
>>     >
>>     > http://irds.usc.edu/SentimentAnalysisParser/models.html
>>     >
>>     > The HT provider review I cannot contribute at this time and I question its broad
>>     > applicability since it’s related to human trafficking. In addition we are still working
>>     > on publishing our analysis & evaluation of it which is why I removed it from the
>>     > commit.
>>     >
>>     > Cheers,
>>     > Chris
>>     >
>>     >
>>     >
>>     >
>>     >
>>     > On 6/29/17, 7:36 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
>>     >
>>     >     Which data sets did you use to evaluate this?
>>     >     I was looking for a bit more than a sample file to train it.
>>     >
>>     >     I noticed that you checked in stanford and netflix models.
>>     >
>>     >     The stanford data set is probably this one:
>>     >     http://ai.stanford.edu/~amaas/data/sentiment/
>>     >
>>     >     Do you have a link for the netflix data?
>>     >
>>     >     Jörn
>>     >
>>     >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <ma...@apache.org> wrote:
>>     >     > Absolutely you can find it here:
>>     >     >
>>     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ (for categorical /multi-class)
>>     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 (for categorical/multi-class)
>>     >     >
>>     >     > We can also do similar files where instead of multi-class, we just use pos/neg as the label.
>>     >     >
>>     >     > Cheers,
>>     >     > Chris
>>     >     >
>>     >     >
>>     >     >
>>     >     >
>>     >     >
>>     >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
>>     >     >
>>     >     >     Hello Chris,
>>     >     >
>>     >     >     could you please point me to files I can use to train the sentiment
>>     >     >     component? I am currently looking again through the code and would
>>     >     >     like to train it myself.
>>     >     >
>>     >     >     Jörn
>>     >     >
>>     >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <da...@gmail.com> wrote:
>>     >     >     > Hi All,
>>     >     >     >    First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
>>     >     >     >
>>     >     >     >   Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
>>     >     >     >
>>     >     >     >   Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
>>     >     >     >
>>     >     >     > Thank you,
>>     >     >     > Daniel
>>     >     >     >
>>     >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
>>     >     >     >>
>>     >     >     >> Hi everyone,
>>     >     >     >>
>>     >     >     >> I spoke with Joern in Slack. Some of his concerns are:
>>     >     >     >>
>>     >     >     >> 1. This was done with a Merge commit and apparently they squash and rebase.
>>     >     >     >> [would be helpful to see some pointer on this for documentation, thus far I
>>     >     >     >> haven’t found any]
>>     >     >     >> 2. Apparently we literally need to ask others for +1 votes and record them
>>     >     >     >> before committing? I thought since Ana and I are committers aren were +1,
>>     >     >     >> and since Joern had been providing feedback (the last of which was to add
>>     >     >     >> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
>>     >     >     >> formally we need to do a +1 vote even still)
>>     >     >     >> 3. There was concern about scalability of the code.
>>     >     >     >> 4. There are thoughts that the code was not perfect yet (even though it works
>>     >     >     >> fine in the MEMEX project for Ana and I)
>>     >     >     >>
>>     >     >     >> So, Joern has opened up a revert PR.
>>     >     >     >>
>>     >     >     >> I suppose I should state I find this process extremely heavyweight and unwelcoming.
>>     >     >     >> To me, there should be a modicum of trust for committers, but I feel like even as a
>>     >     >     >> committer, I am operating as a “contributor” to the project. Committer means that
>>     >     >     >> there is trust to modify the source code base. Of the issues above, the only one I see
>>     >     >     >> as a moderate snafu was #1, and frankly if there are some instructions that show me
>>     >     >     >> how to do squashing and rebasing *first* I will try to do that in the future since I am
>>     >     >     >> not a GIt expert.
>>     >     >     >>
>>     >     >     >> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC
>>     >     >     >> effort, and we have worked pretty consistently on this over the last year. We used a
>>     >     >     >> separate GitHub project to get started, kept Joern involved as another mentor, even
>>     >     >     >> provided access and commit writes to that GitHub repository for a long time, so this
>>     >     >     >> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
>>     >     >     >> I should have gone and worked on that branch first since master is apparently so
>>     >     >     >> pristine that even an Apache veteran like me can’t get something in to it without
>>     >     >     >> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
>>     >     >     >> “community” issues).
>>     >     >     >>
>>     >     >     >> I am concerned from a community point of view that the first comment wasn’t “Great
>>     >     >     >> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
>>     >     >     >> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
>>     >     >     >>
>>     >     >     >> That’s pretty off-putting to someone who is semi-new like me and like Ana.
>>     >     >     >>
>>     >     >     >> Anyways, go ahead and revert it. Sorry to have caused any issues.
>>     >     >     >>
>>     >     >     >> Chris
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
>>     >     >     >>
>>     >     >     >>    Hi Joern,
>>     >     >     >>
>>     >     >     >>    I’m confused. Why did you revert my commit?
>>     >     >     >>
>>     >     >     >>    Every one of those check points you put on the PR was checked?
>>     >     >     >>    We have been discussing this for months, you have seen the
>>     >     >     >>    code for months, Ana and I have worked diligently on the code
>>     >     >     >>    in plain view of everyone.
>>     >     >     >>
>>     >     >     >>    Please explain.
>>     >     >     >>
>>     >     >     >>    Chris
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
>>     >     >     >>
>>     >     >     >>        GitHub user kottmann opened a pull request:
>>     >     >     >>
>>     >     >     >>            https://github.com/apache/opennlp/pull/238
>>     >     >     >>
>>     >     >     >>            Revert merging of sentiment work, no consent to merge it
>>     >     >     >>
>>     >     >     >>            Thank you for contributing to Apache OpenNLP.
>>     >     >     >>
>>     >     >     >>            In order to streamline the review of the contribution we ask you
>>     >     >     >>            to ensure the following steps have been taken:
>>     >     >     >>
>>     >     >     >>            ### For all changes:
>>     >     >     >>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
>>     >     >     >>                 in the commit message?
>>     >     >     >>
>>     >     >     >>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
>>     >     >     >>
>>     >     >     >>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
>>     >     >     >>
>>     >     >     >>            - [ ] Is your initial contribution a single, squashed commit?
>>     >     >     >>
>>     >     >     >>            ### For code changes:
>>     >     >     >>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
>>     >     >     >>            - [ ] Have you written or updated unit tests to verify your changes?
>>     >     >     >>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
>>     >     >     >>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
>>     >     >     >>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
>>     >     >     >>
>>     >     >     >>            ### For documentation related changes:
>>     >     >     >>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
>>     >     >     >>
>>     >     >     >>            ### Note:
>>     >     >     >>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>        You can merge this pull request into a Git repository by running:
>>     >     >     >>
>>     >     >     >>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
>>     >     >     >>
>>     >     >     >>        Alternatively you can review and apply these changes as the patch at:
>>     >     >     >>
>>     >     >     >>            https://github.com/apache/opennlp/pull/238.patch
>>     >     >     >>
>>     >     >     >>        To close this pull request, make a commit to your master/trunk branch
>>     >     >     >>        with (at least) the following in the commit message:
>>     >     >     >>
>>     >     >     >>            This closes #238
>>     >     >     >>
>>     >     >     >>        ----
>>     >     >     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
>>     >     >     >>        Author: Jörn Kottmann <jo...@apache.org>
>>     >     >     >>        Date:   2017-06-27T08:19:19Z
>>     >     >     >>
>>     >     >     >>            Revert merging of sentiment work, no consent to merge it
>>     >     >     >>
>>     >     >     >>        ----
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>        ---
>>     >     >     >>        If your project is set up for it, you can reply to this email and have your
>>     >     >     >>        reply appear on GitHub as well. If your project does not have this feature
>>     >     >     >>        enabled and wishes so, or if the feature is enabled but not working, please
>>     >     >     >>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
>>     >     >     >>        with INFRA.
>>     >     >     >>        ---
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >>
>>     >     >     >
>>     >     >
>>     >     >
>>     >     >
>>     >
>>     >
>>     >
>>
>>
>>

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Chris Mattmann <ma...@apache.org>.
Thanks bro



On 6/29/17, 10:52 AM, "Suneel Marthi" <sm...@apache.org> wrote:

    http://opennlp.apache.org/docs/1.8.0/manual/opennlp.html#tools.doccat
    
    On Thu, Jun 29, 2017 at 1:51 PM, Chris Mattmann <ma...@apache.org> wrote:
    
    > Thanks I will investigate the below thanks Joern. Can someone send me some
    > pointers
    > to the Doc Cat API that I can find? Thanks.
    >
    >
    >
    >
    > On 6/29/17, 10:18 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
    >
    >     For 2. I would like to suggest that we implement doccat format support
    >     to train on that data.
    >
    >     3. it would be best so think about how we want to test the doccat
    >     component, today we don't have any tests which use lots of data to
    >     evaluate it.
    >     Probably the sentitment data could solve this for us and a train and
    >     evaluate test could be included in the eval tests.
    >
    >     +1 to revert and then do these steps after the 1.8.1 release.
    >
    >     I can apply my PR myself if nobody objects.
    >
    >     Jörn
    >
    >     On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <ma...@apache.org>
    > wrote:
    >     > Hi Rodrigo,
    >     >
    >     > This is very useful feedback that I wish we would have had a long
    > time ago.
    >     >
    >     > I will look into it and see if I can reproduce the CLI error. I did
    > a full build and mvn
    >     > install (which I though would run tests?) before commiting and as I
    > posted in JIRA
    >     > the tests passed for me? So I will have to look into that.
    >     >
    >     > That said, given your feedback that SentimentME and the Sentiment
    > Component
    >     > doesn’t offer much over Document Classifier I agree with you, but
    > wasn’t super
    >     > familiar with the Document Classifier API. That said, if we can get
    > the same functionality
    >     > by just using Document Classifier why don’t we:
    >     >
    >     > 1. Remove the SentimentME and associated code (except for the unit
    > tests)
    >     > 2. Use the sample datasets from NetFlix & Stanford Treebank
    > sentiment and
    >     > build models using Document Classifier API.
    >     > 3. Rename and keep the unit tests that test against Netflix and
    > Stanford tree bank.
    >     >
    >     > That way we get basic sentiment analysis (that is working for us
    > internally at JPL decently),
    >     > for Apache OpenNLP, and then if we want to build something better
    > than a Document
    >     > Classification approach to sentiment we can do so.
    >     >
    >     > Thoughts?
    >     >
    >     > Thanks for your useful feedback. If everyone agrees this is a plan I
    > can back out the code
    >     > using Joern’s revert, and then try and execute 1-3 above in a branch
    > first. Thanks.
    >     >
    >     > Cheers,
    >     > Chris
    >     >
    >     >
    >     >
    >     > On 6/29/17, 10:03 AM, "Rodrigo Agerri" <ra...@apache.org> wrote:
    >     >
    >     >     Hi Chris,
    >     >
    >     >     I have been interested in the new sentiment component for a
    > while,
    >     >     although truth to be told, I did not follow that closely. I have
    > today
    >     >     looked at it and test it with some of the corpora you have
    > mentioned.
    >     >     In order to do that, I checkout master to work with from this
    > commit
    >     >     onwards
    >     >
    >     >     https://github.com/apache/opennlp/commit/
    > 56321aab51a470cd2004b76fb1f5330881b943c1
    >     >
    >     >     1. I tried to run it from the CLI. The Sentiment component did
    > not
    >     >     appear to be available.
    >     >     2. I added the SentimentTrainer and Evaluator to the cmdline.CLI
    > (no
    >     >     SentimentTool is implemented to tag with a trained model).
    >     >     3. After that, the CLI tests did not pass. So, the CLI is
    > currently
    >     >     non functional, unless I did something wrong, always possible, of
    >     >     course. See if you can reproduce that error.
    >     >
    >     >     I therefore did the tests via API. I implemented a little test
    > for
    >     >     training, evaluating and tagging here:
    >     >
    >     >     https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
    >     >
    >     >     I run the training on the large movies review from Stanford for
    > binary
    >     >     polarity classification
    >     >
    >     >     http://ai.stanford.edu/~amaas/data/sentiment/
    >     >
    >     >     and on the two little samples multiclass files added in
    > resources and
    >     >     mentioned in the previous email, using the first one for
    > training and
    >     >     the second one for testing (maxent 100 iterations, cutoff 5).
    >     >
    >     >     2. Stanford results: 0.84264
    >     >     3. sample multiclass: 0.73
    >     >
    >     >     Given that this is a standard document classification task, I
    > decided
    >     >     to train the doccat component from the CLI:
    >     >
    >     >     1. Stanford results: 0.84264 (BOW features by default).
    >     >     2. sample multiclass: 0.73
    >     >
    >     >     I then looked at the code of the sentiment component and saw
    > that it
    >     >     is basically a document classifier working with bag of words
    > features.
    >     >     No added functionality. So, my conclusions are:
    >     >
    >     >     1. The CLI needs to be fixed.
    >     >     2. The Sentiment component, as it is, provides the same
    > functionality
    >     >     as the document classifier.
    >     >
    >     >     I would therefore reconsider this commit until those two issues
    > are
    >     >     addressed. Just my opinion.
    >     >
    >     >     Best regards,
    >     >
    >     >     Rodrigo
    >     >
    >     >     On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <
    > mattmann@apache.org> wrote:
    >     >     >
    >     >     > Hey Joern,
    >     >     >
    >     >     > Sure, you can find the model data links here, along with our
    > evaluation of them.
    >     >     >
    >     >     > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
    >     >     >
    >     >     > There are other evaluations here:
    >     >     >
    >     >     > http://irds.usc.edu/SentimentAnalysisParser/models.html
    >     >     >
    >     >     > The HT provider review I cannot contribute at this time and I
    > question its broad
    >     >     > applicability since it’s related to human trafficking. In
    > addition we are still working
    >     >     > on publishing our analysis & evaluation of it which is why I
    > removed it from the
    >     >     > commit.
    >     >     >
    >     >     > Cheers,
    >     >     > Chris
    >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >     > On 6/29/17, 7:36 AM, "Joern Kottmann" <ko...@gmail.com>
    > wrote:
    >     >     >
    >     >     >     Which data sets did you use to evaluate this?
    >     >     >     I was looking for a bit more than a sample file to train
    > it.
    >     >     >
    >     >     >     I noticed that you checked in stanford and netflix models.
    >     >     >
    >     >     >     The stanford data set is probably this one:
    >     >     >     http://ai.stanford.edu/~amaas/data/sentiment/
    >     >     >
    >     >     >     Do you have a link for the netflix data?
    >     >     >
    >     >     >     Jörn
    >     >     >
    >     >     >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <
    > mattmann@apache.org> wrote:
    >     >     >     > Absolutely you can find it here:
    >     >     >     >
    >     >     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ
    > (for categorical /multi-class)
    >     >     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2
    > (for categorical/multi-class)
    >     >     >     >
    >     >     >     > We can also do similar files where instead of
    > multi-class, we just use pos/neg as the label.
    >     >     >     >
    >     >     >     > Cheers,
    >     >     >     > Chris
    >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <
    > kottmann@gmail.com> wrote:
    >     >     >     >
    >     >     >     >     Hello Chris,
    >     >     >     >
    >     >     >     >     could you please point me to files I can use to
    > train the sentiment
    >     >     >     >     component? I am currently looking again through the
    > code and would
    >     >     >     >     like to train it myself.
    >     >     >     >
    >     >     >     >     Jörn
    >     >     >     >
    >     >     >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <
    > danruss00@gmail.com> wrote:
    >     >     >     >     > Hi All,
    >     >     >     >     >    First, let me take a share of blame for the
    > comment Chris mentioned.  I believe I said something like the pull request
    > was X revision behind and Y revisions ahead.  It was not meant to be rude,
    > it was meant to say it is hard to review code when it is so different from
    > the current code base. I am very excited that sentiment analysis is going
    > to be added to OpenNLP, but I have not had time to play with it. If I were
    > to say “great job” before I have add a chance to look at it, it would be
    > flattery not honest praise.
    >     >     >     >     >
    >     >     >     >     >   Let’s clean up the merge.  I agree with Chris
    > that scalability and perfection should not be our initial goal.  Let’s get
    > something, and we can decide how to optimize later (even if it require a
    > complete rewrite).  Perfection is the enemy of the good.
    >     >     >     >     >
    >     >     >     >     >   Finally, because of Chris’ comments it is hard
    > to thank Ana and Chris without sounding insincere.  But I’ll try, thank you
    > Chris and Ana.  I hope we can get beyond this and that Chris and Ana will
    > continue to improve the performance of the sentiment analysis tool and
    > happily remain part of the OpenNLP family.  It is also a good time to toss
    > a big thank you to all of the committers, users, and PMC member.  I use
    > OpenNLP almost everyday.  Your work is extremely valuable to me.
    >     >     >     >     >
    >     >     >     >     > Thank you,
    >     >     >     >     > Daniel
    >     >     >     >     >
    >     >     >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <
    > mattmann@apache.org> wrote:
    >     >     >     >     >>
    >     >     >     >     >> Hi everyone,
    >     >     >     >     >>
    >     >     >     >     >> I spoke with Joern in Slack. Some of his concerns
    > are:
    >     >     >     >     >>
    >     >     >     >     >> 1. This was done with a Merge commit and
    > apparently they squash and rebase.
    >     >     >     >     >> [would be helpful to see some pointer on this for
    > documentation, thus far I
    >     >     >     >     >> haven’t found any]
    >     >     >     >     >> 2. Apparently we literally need to ask others for
    > +1 votes and record them
    >     >     >     >     >> before committing? I thought since Ana and I are
    > committers aren were +1,
    >     >     >     >     >> and since Joern had been providing feedback (the
    > last of which was to add
    >     >     >     >     >> tests, which we did) that he would be +1 as well
    > (I guess he is not, and I guess
    >     >     >     >     >> formally we need to do a +1 vote even still)
    >     >     >     >     >> 3. There was concern about scalability of the
    > code.
    >     >     >     >     >> 4. There are thoughts that the code was not
    > perfect yet (even though it works
    >     >     >     >     >> fine in the MEMEX project for Ana and I)
    >     >     >     >     >>
    >     >     >     >     >> So, Joern has opened up a revert PR.
    >     >     >     >     >>
    >     >     >     >     >> I suppose I should state I find this process
    > extremely heavyweight and unwelcoming.
    >     >     >     >     >> To me, there should be a modicum of trust for
    > committers, but I feel like even as a
    >     >     >     >     >> committer, I am operating as a “contributor” to
    > the project. Committer means that
    >     >     >     >     >> there is trust to modify the source code base. Of
    > the issues above, the only one I see
    >     >     >     >     >> as a moderate snafu was #1, and frankly if there
    > are some instructions that show me
    >     >     >     >     >> how to do squashing and rebasing *first* I will
    > try to do that in the future since I am
    >     >     >     >     >> not a GIt expert.
    >     >     >     >     >>
    >     >     >     >     >> That said, I must state I feel pretty put off by
    > Apache OpenNLP. This originated as a GSoC
    >     >     >     >     >> effort, and we have worked pretty consistently on
    > this over the last year. We used a
    >     >     >     >     >> separate GitHub project to get started, kept
    > Joern involved as another mentor, even
    >     >     >     >     >> provided access and commit writes to that GitHub
    > repository for a long time, so this
    >     >     >     >     >> code was developed in the open. Joern even
    > created a branch in ApacheOpenNLP in the code and I suppose
    >     >     >     >     >> I should have gone and worked on that branch
    > first since master is apparently so
    >     >     >     >     >> pristine that even an Apache veteran like me
    > can’t get something in to it without
    >     >     >     >     >> making a whole bunch of (what are IMO minor
    > issues, and what are IMO heavyweight
    >     >     >     >     >> “community” issues).
    >     >     >     >     >>
    >     >     >     >     >> I am concerned from a community point of view
    > that the first comment wasn’t “Great
    >     >     >     >     >> job Chris, you got Sentiment Analysis into
    > Apache, *but* I have these concerns 1-4 above”.
    >     >     >     >     >> It was “The PR was merged wrong in ways 1-4 and
    > I’m going to revert it.”
    >     >     >     >     >>
    >     >     >     >     >> That’s pretty off-putting to someone who is
    > semi-new like me and like Ana.
    >     >     >     >     >>
    >     >     >     >     >> Anyways, go ahead and revert it. Sorry to have
    > caused any issues.
    >     >     >     >     >>
    >     >     >     >     >> Chris
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <
    > mattmann@apache.org> wrote:
    >     >     >     >     >>
    >     >     >     >     >>    Hi Joern,
    >     >     >     >     >>
    >     >     >     >     >>    I’m confused. Why did you revert my commit?
    >     >     >     >     >>
    >     >     >     >     >>    Every one of those check points you put on the
    > PR was checked?
    >     >     >     >     >>    We have been discussing this for months, you
    > have seen the
    >     >     >     >     >>    code for months, Ana and I have worked
    > diligently on the code
    >     >     >     >     >>    in plain view of everyone.
    >     >     >     >     >>
    >     >     >     >     >>    Please explain.
    >     >     >     >     >>
    >     >     >     >     >>    Chris
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>    On 6/27/17, 1:23 AM, "kottmann" <
    > git@git.apache.org> wrote:
    >     >     >     >     >>
    >     >     >     >     >>        GitHub user kottmann opened a pull request:
    >     >     >     >     >>
    >     >     >     >     >>            https://github.com/apache/
    > opennlp/pull/238
    >     >     >     >     >>
    >     >     >     >     >>            Revert merging of sentiment work, no
    > consent to merge it
    >     >     >     >     >>
    >     >     >     >     >>            Thank you for contributing to Apache
    > OpenNLP.
    >     >     >     >     >>
    >     >     >     >     >>            In order to streamline the review of
    > the contribution we ask you
    >     >     >     >     >>            to ensure the following steps have
    > been taken:
    >     >     >     >     >>
    >     >     >     >     >>            ### For all changes:
    >     >     >     >     >>            - [ ] Is there a JIRA ticket
    > associated with this PR? Is it referenced
    >     >     >     >     >>                 in the commit message?
    >     >     >     >     >>
    >     >     >     >     >>            - [ ] Does your PR title start with
    > OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay
    > particular attention to the hyphen "-" character.
    >     >     >     >     >>
    >     >     >     >     >>            - [ ] Has your PR been rebased against
    > the latest commit within the target branch (typically master)?
    >     >     >     >     >>
    >     >     >     >     >>            - [ ] Is your initial contribution a
    > single, squashed commit?
    >     >     >     >     >>
    >     >     >     >     >>            ### For code changes:
    >     >     >     >     >>            - [ ] Have you ensured that the full
    > suite of tests is executed via mvn clean install at the root opennlp folder?
    >     >     >     >     >>            - [ ] Have you written or updated unit
    > tests to verify your changes?
    >     >     >     >     >>            - [ ] If adding new dependencies to
    > the code, are these dependencies licensed in a way that is compatible for
    > inclusion under [ASF 2.0](http://www.apache.org/
    > legal/resolved.html#category-a)?
    >     >     >     >     >>            - [ ] If applicable, have you updated
    > the LICENSE file, including the main LICENSE file in opennlp folder?
    >     >     >     >     >>            - [ ] If applicable, have you updated
    > the NOTICE file, including the main NOTICE file found in opennlp folder?
    >     >     >     >     >>
    >     >     >     >     >>            ### For documentation related changes:
    >     >     >     >     >>            - [ ] Have you ensured that format
    > looks appropriate for the output in which it is rendered?
    >     >     >     >     >>
    >     >     >     >     >>            ### Note:
    >     >     >     >     >>            Please ensure that once the PR is
    > submitted, you check travis-ci for build issues and submit an update to
    > your PR as soon as possible.
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>        You can merge this pull request into a Git
    > repository by running:
    >     >     >     >     >>
    >     >     >     >     >>            $ git pull
    > https://github.com/kottmann/opennlp revert_sentiment
    >     >     >     >     >>
    >     >     >     >     >>        Alternatively you can review and apply
    > these changes as the patch at:
    >     >     >     >     >>
    >     >     >     >     >>            https://github.com/apache/
    > opennlp/pull/238.patch
    >     >     >     >     >>
    >     >     >     >     >>        To close this pull request, make a commit
    > to your master/trunk branch
    >     >     >     >     >>        with (at least) the following in the
    > commit message:
    >     >     >     >     >>
    >     >     >     >     >>            This closes #238
    >     >     >     >     >>
    >     >     >     >     >>        ----
    >     >     >     >     >>        commit 123222eb34724bae793e9d6d22e202
    > c0aee0aa45
    >     >     >     >     >>        Author: Jörn Kottmann <jo...@apache.org>
    >     >     >     >     >>        Date:   2017-06-27T08:19:19Z
    >     >     >     >     >>
    >     >     >     >     >>            Revert merging of sentiment work, no
    > consent to merge it
    >     >     >     >     >>
    >     >     >     >     >>        ----
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>        ---
    >     >     >     >     >>        If your project is set up for it, you can
    > reply to this email and have your
    >     >     >     >     >>        reply appear on GitHub as well. If your
    > project does not have this feature
    >     >     >     >     >>        enabled and wishes so, or if the feature
    > is enabled but not working, please
    >     >     >     >     >>        contact infrastructure at
    > infrastructure@apache.org or file a JIRA ticket
    >     >     >     >     >>        with INFRA.
    >     >     >     >     >>        ---
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >>
    >     >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >
    >     >
    >     >
    >
    >
    >
    >
    



Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Suneel Marthi <sm...@apache.org>.
http://opennlp.apache.org/docs/1.8.0/manual/opennlp.html#tools.doccat

On Thu, Jun 29, 2017 at 1:51 PM, Chris Mattmann <ma...@apache.org> wrote:

> Thanks I will investigate the below thanks Joern. Can someone send me some
> pointers
> to the Doc Cat API that I can find? Thanks.
>
>
>
>
> On 6/29/17, 10:18 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
>
>     For 2. I would like to suggest that we implement doccat format support
>     to train on that data.
>
>     3. it would be best so think about how we want to test the doccat
>     component, today we don't have any tests which use lots of data to
>     evaluate it.
>     Probably the sentitment data could solve this for us and a train and
>     evaluate test could be included in the eval tests.
>
>     +1 to revert and then do these steps after the 1.8.1 release.
>
>     I can apply my PR myself if nobody objects.
>
>     Jörn
>
>     On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <ma...@apache.org>
> wrote:
>     > Hi Rodrigo,
>     >
>     > This is very useful feedback that I wish we would have had a long
> time ago.
>     >
>     > I will look into it and see if I can reproduce the CLI error. I did
> a full build and mvn
>     > install (which I though would run tests?) before commiting and as I
> posted in JIRA
>     > the tests passed for me? So I will have to look into that.
>     >
>     > That said, given your feedback that SentimentME and the Sentiment
> Component
>     > doesn’t offer much over Document Classifier I agree with you, but
> wasn’t super
>     > familiar with the Document Classifier API. That said, if we can get
> the same functionality
>     > by just using Document Classifier why don’t we:
>     >
>     > 1. Remove the SentimentME and associated code (except for the unit
> tests)
>     > 2. Use the sample datasets from NetFlix & Stanford Treebank
> sentiment and
>     > build models using Document Classifier API.
>     > 3. Rename and keep the unit tests that test against Netflix and
> Stanford tree bank.
>     >
>     > That way we get basic sentiment analysis (that is working for us
> internally at JPL decently),
>     > for Apache OpenNLP, and then if we want to build something better
> than a Document
>     > Classification approach to sentiment we can do so.
>     >
>     > Thoughts?
>     >
>     > Thanks for your useful feedback. If everyone agrees this is a plan I
> can back out the code
>     > using Joern’s revert, and then try and execute 1-3 above in a branch
> first. Thanks.
>     >
>     > Cheers,
>     > Chris
>     >
>     >
>     >
>     > On 6/29/17, 10:03 AM, "Rodrigo Agerri" <ra...@apache.org> wrote:
>     >
>     >     Hi Chris,
>     >
>     >     I have been interested in the new sentiment component for a
> while,
>     >     although truth to be told, I did not follow that closely. I have
> today
>     >     looked at it and test it with some of the corpora you have
> mentioned.
>     >     In order to do that, I checkout master to work with from this
> commit
>     >     onwards
>     >
>     >     https://github.com/apache/opennlp/commit/
> 56321aab51a470cd2004b76fb1f5330881b943c1
>     >
>     >     1. I tried to run it from the CLI. The Sentiment component did
> not
>     >     appear to be available.
>     >     2. I added the SentimentTrainer and Evaluator to the cmdline.CLI
> (no
>     >     SentimentTool is implemented to tag with a trained model).
>     >     3. After that, the CLI tests did not pass. So, the CLI is
> currently
>     >     non functional, unless I did something wrong, always possible, of
>     >     course. See if you can reproduce that error.
>     >
>     >     I therefore did the tests via API. I implemented a little test
> for
>     >     training, evaluating and tagging here:
>     >
>     >     https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
>     >
>     >     I run the training on the large movies review from Stanford for
> binary
>     >     polarity classification
>     >
>     >     http://ai.stanford.edu/~amaas/data/sentiment/
>     >
>     >     and on the two little samples multiclass files added in
> resources and
>     >     mentioned in the previous email, using the first one for
> training and
>     >     the second one for testing (maxent 100 iterations, cutoff 5).
>     >
>     >     2. Stanford results: 0.84264
>     >     3. sample multiclass: 0.73
>     >
>     >     Given that this is a standard document classification task, I
> decided
>     >     to train the doccat component from the CLI:
>     >
>     >     1. Stanford results: 0.84264 (BOW features by default).
>     >     2. sample multiclass: 0.73
>     >
>     >     I then looked at the code of the sentiment component and saw
> that it
>     >     is basically a document classifier working with bag of words
> features.
>     >     No added functionality. So, my conclusions are:
>     >
>     >     1. The CLI needs to be fixed.
>     >     2. The Sentiment component, as it is, provides the same
> functionality
>     >     as the document classifier.
>     >
>     >     I would therefore reconsider this commit until those two issues
> are
>     >     addressed. Just my opinion.
>     >
>     >     Best regards,
>     >
>     >     Rodrigo
>     >
>     >     On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <
> mattmann@apache.org> wrote:
>     >     >
>     >     > Hey Joern,
>     >     >
>     >     > Sure, you can find the model data links here, along with our
> evaluation of them.
>     >     >
>     >     > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
>     >     >
>     >     > There are other evaluations here:
>     >     >
>     >     > http://irds.usc.edu/SentimentAnalysisParser/models.html
>     >     >
>     >     > The HT provider review I cannot contribute at this time and I
> question its broad
>     >     > applicability since it’s related to human trafficking. In
> addition we are still working
>     >     > on publishing our analysis & evaluation of it which is why I
> removed it from the
>     >     > commit.
>     >     >
>     >     > Cheers,
>     >     > Chris
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > On 6/29/17, 7:36 AM, "Joern Kottmann" <ko...@gmail.com>
> wrote:
>     >     >
>     >     >     Which data sets did you use to evaluate this?
>     >     >     I was looking for a bit more than a sample file to train
> it.
>     >     >
>     >     >     I noticed that you checked in stanford and netflix models.
>     >     >
>     >     >     The stanford data set is probably this one:
>     >     >     http://ai.stanford.edu/~amaas/data/sentiment/
>     >     >
>     >     >     Do you have a link for the netflix data?
>     >     >
>     >     >     Jörn
>     >     >
>     >     >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <
> mattmann@apache.org> wrote:
>     >     >     > Absolutely you can find it here:
>     >     >     >
>     >     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ
> (for categorical /multi-class)
>     >     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2
> (for categorical/multi-class)
>     >     >     >
>     >     >     > We can also do similar files where instead of
> multi-class, we just use pos/neg as the label.
>     >     >     >
>     >     >     > Cheers,
>     >     >     > Chris
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <
> kottmann@gmail.com> wrote:
>     >     >     >
>     >     >     >     Hello Chris,
>     >     >     >
>     >     >     >     could you please point me to files I can use to
> train the sentiment
>     >     >     >     component? I am currently looking again through the
> code and would
>     >     >     >     like to train it myself.
>     >     >     >
>     >     >     >     Jörn
>     >     >     >
>     >     >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <
> danruss00@gmail.com> wrote:
>     >     >     >     > Hi All,
>     >     >     >     >    First, let me take a share of blame for the
> comment Chris mentioned.  I believe I said something like the pull request
> was X revision behind and Y revisions ahead.  It was not meant to be rude,
> it was meant to say it is hard to review code when it is so different from
> the current code base. I am very excited that sentiment analysis is going
> to be added to OpenNLP, but I have not had time to play with it. If I were
> to say “great job” before I have add a chance to look at it, it would be
> flattery not honest praise.
>     >     >     >     >
>     >     >     >     >   Let’s clean up the merge.  I agree with Chris
> that scalability and perfection should not be our initial goal.  Let’s get
> something, and we can decide how to optimize later (even if it require a
> complete rewrite).  Perfection is the enemy of the good.
>     >     >     >     >
>     >     >     >     >   Finally, because of Chris’ comments it is hard
> to thank Ana and Chris without sounding insincere.  But I’ll try, thank you
> Chris and Ana.  I hope we can get beyond this and that Chris and Ana will
> continue to improve the performance of the sentiment analysis tool and
> happily remain part of the OpenNLP family.  It is also a good time to toss
> a big thank you to all of the committers, users, and PMC member.  I use
> OpenNLP almost everyday.  Your work is extremely valuable to me.
>     >     >     >     >
>     >     >     >     > Thank you,
>     >     >     >     > Daniel
>     >     >     >     >
>     >     >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <
> mattmann@apache.org> wrote:
>     >     >     >     >>
>     >     >     >     >> Hi everyone,
>     >     >     >     >>
>     >     >     >     >> I spoke with Joern in Slack. Some of his concerns
> are:
>     >     >     >     >>
>     >     >     >     >> 1. This was done with a Merge commit and
> apparently they squash and rebase.
>     >     >     >     >> [would be helpful to see some pointer on this for
> documentation, thus far I
>     >     >     >     >> haven’t found any]
>     >     >     >     >> 2. Apparently we literally need to ask others for
> +1 votes and record them
>     >     >     >     >> before committing? I thought since Ana and I are
> committers aren were +1,
>     >     >     >     >> and since Joern had been providing feedback (the
> last of which was to add
>     >     >     >     >> tests, which we did) that he would be +1 as well
> (I guess he is not, and I guess
>     >     >     >     >> formally we need to do a +1 vote even still)
>     >     >     >     >> 3. There was concern about scalability of the
> code.
>     >     >     >     >> 4. There are thoughts that the code was not
> perfect yet (even though it works
>     >     >     >     >> fine in the MEMEX project for Ana and I)
>     >     >     >     >>
>     >     >     >     >> So, Joern has opened up a revert PR.
>     >     >     >     >>
>     >     >     >     >> I suppose I should state I find this process
> extremely heavyweight and unwelcoming.
>     >     >     >     >> To me, there should be a modicum of trust for
> committers, but I feel like even as a
>     >     >     >     >> committer, I am operating as a “contributor” to
> the project. Committer means that
>     >     >     >     >> there is trust to modify the source code base. Of
> the issues above, the only one I see
>     >     >     >     >> as a moderate snafu was #1, and frankly if there
> are some instructions that show me
>     >     >     >     >> how to do squashing and rebasing *first* I will
> try to do that in the future since I am
>     >     >     >     >> not a GIt expert.
>     >     >     >     >>
>     >     >     >     >> That said, I must state I feel pretty put off by
> Apache OpenNLP. This originated as a GSoC
>     >     >     >     >> effort, and we have worked pretty consistently on
> this over the last year. We used a
>     >     >     >     >> separate GitHub project to get started, kept
> Joern involved as another mentor, even
>     >     >     >     >> provided access and commit writes to that GitHub
> repository for a long time, so this
>     >     >     >     >> code was developed in the open. Joern even
> created a branch in ApacheOpenNLP in the code and I suppose
>     >     >     >     >> I should have gone and worked on that branch
> first since master is apparently so
>     >     >     >     >> pristine that even an Apache veteran like me
> can’t get something in to it without
>     >     >     >     >> making a whole bunch of (what are IMO minor
> issues, and what are IMO heavyweight
>     >     >     >     >> “community” issues).
>     >     >     >     >>
>     >     >     >     >> I am concerned from a community point of view
> that the first comment wasn’t “Great
>     >     >     >     >> job Chris, you got Sentiment Analysis into
> Apache, *but* I have these concerns 1-4 above”.
>     >     >     >     >> It was “The PR was merged wrong in ways 1-4 and
> I’m going to revert it.”
>     >     >     >     >>
>     >     >     >     >> That’s pretty off-putting to someone who is
> semi-new like me and like Ana.
>     >     >     >     >>
>     >     >     >     >> Anyways, go ahead and revert it. Sorry to have
> caused any issues.
>     >     >     >     >>
>     >     >     >     >> Chris
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <
> mattmann@apache.org> wrote:
>     >     >     >     >>
>     >     >     >     >>    Hi Joern,
>     >     >     >     >>
>     >     >     >     >>    I’m confused. Why did you revert my commit?
>     >     >     >     >>
>     >     >     >     >>    Every one of those check points you put on the
> PR was checked?
>     >     >     >     >>    We have been discussing this for months, you
> have seen the
>     >     >     >     >>    code for months, Ana and I have worked
> diligently on the code
>     >     >     >     >>    in plain view of everyone.
>     >     >     >     >>
>     >     >     >     >>    Please explain.
>     >     >     >     >>
>     >     >     >     >>    Chris
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >>    On 6/27/17, 1:23 AM, "kottmann" <
> git@git.apache.org> wrote:
>     >     >     >     >>
>     >     >     >     >>        GitHub user kottmann opened a pull request:
>     >     >     >     >>
>     >     >     >     >>            https://github.com/apache/
> opennlp/pull/238
>     >     >     >     >>
>     >     >     >     >>            Revert merging of sentiment work, no
> consent to merge it
>     >     >     >     >>
>     >     >     >     >>            Thank you for contributing to Apache
> OpenNLP.
>     >     >     >     >>
>     >     >     >     >>            In order to streamline the review of
> the contribution we ask you
>     >     >     >     >>            to ensure the following steps have
> been taken:
>     >     >     >     >>
>     >     >     >     >>            ### For all changes:
>     >     >     >     >>            - [ ] Is there a JIRA ticket
> associated with this PR? Is it referenced
>     >     >     >     >>                 in the commit message?
>     >     >     >     >>
>     >     >     >     >>            - [ ] Does your PR title start with
> OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay
> particular attention to the hyphen "-" character.
>     >     >     >     >>
>     >     >     >     >>            - [ ] Has your PR been rebased against
> the latest commit within the target branch (typically master)?
>     >     >     >     >>
>     >     >     >     >>            - [ ] Is your initial contribution a
> single, squashed commit?
>     >     >     >     >>
>     >     >     >     >>            ### For code changes:
>     >     >     >     >>            - [ ] Have you ensured that the full
> suite of tests is executed via mvn clean install at the root opennlp folder?
>     >     >     >     >>            - [ ] Have you written or updated unit
> tests to verify your changes?
>     >     >     >     >>            - [ ] If adding new dependencies to
> the code, are these dependencies licensed in a way that is compatible for
> inclusion under [ASF 2.0](http://www.apache.org/
> legal/resolved.html#category-a)?
>     >     >     >     >>            - [ ] If applicable, have you updated
> the LICENSE file, including the main LICENSE file in opennlp folder?
>     >     >     >     >>            - [ ] If applicable, have you updated
> the NOTICE file, including the main NOTICE file found in opennlp folder?
>     >     >     >     >>
>     >     >     >     >>            ### For documentation related changes:
>     >     >     >     >>            - [ ] Have you ensured that format
> looks appropriate for the output in which it is rendered?
>     >     >     >     >>
>     >     >     >     >>            ### Note:
>     >     >     >     >>            Please ensure that once the PR is
> submitted, you check travis-ci for build issues and submit an update to
> your PR as soon as possible.
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >>        You can merge this pull request into a Git
> repository by running:
>     >     >     >     >>
>     >     >     >     >>            $ git pull
> https://github.com/kottmann/opennlp revert_sentiment
>     >     >     >     >>
>     >     >     >     >>        Alternatively you can review and apply
> these changes as the patch at:
>     >     >     >     >>
>     >     >     >     >>            https://github.com/apache/
> opennlp/pull/238.patch
>     >     >     >     >>
>     >     >     >     >>        To close this pull request, make a commit
> to your master/trunk branch
>     >     >     >     >>        with (at least) the following in the
> commit message:
>     >     >     >     >>
>     >     >     >     >>            This closes #238
>     >     >     >     >>
>     >     >     >     >>        ----
>     >     >     >     >>        commit 123222eb34724bae793e9d6d22e202
> c0aee0aa45
>     >     >     >     >>        Author: Jörn Kottmann <jo...@apache.org>
>     >     >     >     >>        Date:   2017-06-27T08:19:19Z
>     >     >     >     >>
>     >     >     >     >>            Revert merging of sentiment work, no
> consent to merge it
>     >     >     >     >>
>     >     >     >     >>        ----
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >>        ---
>     >     >     >     >>        If your project is set up for it, you can
> reply to this email and have your
>     >     >     >     >>        reply appear on GitHub as well. If your
> project does not have this feature
>     >     >     >     >>        enabled and wishes so, or if the feature
> is enabled but not working, please
>     >     >     >     >>        contact infrastructure at
> infrastructure@apache.org or file a JIRA ticket
>     >     >     >     >>        with INFRA.
>     >     >     >     >>        ---
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >>
>     >     >     >     >
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >
>     >     >
>     >     >
>     >
>     >
>     >
>
>
>
>

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Chris Mattmann <ma...@apache.org>.
Thanks I will investigate the below thanks Joern. Can someone send me some pointers
to the Doc Cat API that I can find? Thanks.




On 6/29/17, 10:18 AM, "Joern Kottmann" <ko...@gmail.com> wrote:

    For 2. I would like to suggest that we implement doccat format support
    to train on that data.
    
    3. it would be best so think about how we want to test the doccat
    component, today we don't have any tests which use lots of data to
    evaluate it.
    Probably the sentitment data could solve this for us and a train and
    evaluate test could be included in the eval tests.
    
    +1 to revert and then do these steps after the 1.8.1 release.
    
    I can apply my PR myself if nobody objects.
    
    Jörn
    
    On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <ma...@apache.org> wrote:
    > Hi Rodrigo,
    >
    > This is very useful feedback that I wish we would have had a long time ago.
    >
    > I will look into it and see if I can reproduce the CLI error. I did a full build and mvn
    > install (which I though would run tests?) before commiting and as I posted in JIRA
    > the tests passed for me? So I will have to look into that.
    >
    > That said, given your feedback that SentimentME and the Sentiment Component
    > doesn’t offer much over Document Classifier I agree with you, but wasn’t super
    > familiar with the Document Classifier API. That said, if we can get the same functionality
    > by just using Document Classifier why don’t we:
    >
    > 1. Remove the SentimentME and associated code (except for the unit tests)
    > 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
    > build models using Document Classifier API.
    > 3. Rename and keep the unit tests that test against Netflix and Stanford tree bank.
    >
    > That way we get basic sentiment analysis (that is working for us internally at JPL decently),
    > for Apache OpenNLP, and then if we want to build something better than a Document
    > Classification approach to sentiment we can do so.
    >
    > Thoughts?
    >
    > Thanks for your useful feedback. If everyone agrees this is a plan I can back out the code
    > using Joern’s revert, and then try and execute 1-3 above in a branch first. Thanks.
    >
    > Cheers,
    > Chris
    >
    >
    >
    > On 6/29/17, 10:03 AM, "Rodrigo Agerri" <ra...@apache.org> wrote:
    >
    >     Hi Chris,
    >
    >     I have been interested in the new sentiment component for a while,
    >     although truth to be told, I did not follow that closely. I have today
    >     looked at it and test it with some of the corpora you have mentioned.
    >     In order to do that, I checkout master to work with from this commit
    >     onwards
    >
    >     https://github.com/apache/opennlp/commit/56321aab51a470cd2004b76fb1f5330881b943c1
    >
    >     1. I tried to run it from the CLI. The Sentiment component did not
    >     appear to be available.
    >     2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no
    >     SentimentTool is implemented to tag with a trained model).
    >     3. After that, the CLI tests did not pass. So, the CLI is currently
    >     non functional, unless I did something wrong, always possible, of
    >     course. See if you can reproduce that error.
    >
    >     I therefore did the tests via API. I implemented a little test for
    >     training, evaluating and tagging here:
    >
    >     https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
    >
    >     I run the training on the large movies review from Stanford for binary
    >     polarity classification
    >
    >     http://ai.stanford.edu/~amaas/data/sentiment/
    >
    >     and on the two little samples multiclass files added in resources and
    >     mentioned in the previous email, using the first one for training and
    >     the second one for testing (maxent 100 iterations, cutoff 5).
    >
    >     2. Stanford results: 0.84264
    >     3. sample multiclass: 0.73
    >
    >     Given that this is a standard document classification task, I decided
    >     to train the doccat component from the CLI:
    >
    >     1. Stanford results: 0.84264 (BOW features by default).
    >     2. sample multiclass: 0.73
    >
    >     I then looked at the code of the sentiment component and saw that it
    >     is basically a document classifier working with bag of words features.
    >     No added functionality. So, my conclusions are:
    >
    >     1. The CLI needs to be fixed.
    >     2. The Sentiment component, as it is, provides the same functionality
    >     as the document classifier.
    >
    >     I would therefore reconsider this commit until those two issues are
    >     addressed. Just my opinion.
    >
    >     Best regards,
    >
    >     Rodrigo
    >
    >     On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <ma...@apache.org> wrote:
    >     >
    >     > Hey Joern,
    >     >
    >     > Sure, you can find the model data links here, along with our evaluation of them.
    >     >
    >     > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
    >     >
    >     > There are other evaluations here:
    >     >
    >     > http://irds.usc.edu/SentimentAnalysisParser/models.html
    >     >
    >     > The HT provider review I cannot contribute at this time and I question its broad
    >     > applicability since it’s related to human trafficking. In addition we are still working
    >     > on publishing our analysis & evaluation of it which is why I removed it from the
    >     > commit.
    >     >
    >     > Cheers,
    >     > Chris
    >     >
    >     >
    >     >
    >     >
    >     >
    >     > On 6/29/17, 7:36 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
    >     >
    >     >     Which data sets did you use to evaluate this?
    >     >     I was looking for a bit more than a sample file to train it.
    >     >
    >     >     I noticed that you checked in stanford and netflix models.
    >     >
    >     >     The stanford data set is probably this one:
    >     >     http://ai.stanford.edu/~amaas/data/sentiment/
    >     >
    >     >     Do you have a link for the netflix data?
    >     >
    >     >     Jörn
    >     >
    >     >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <ma...@apache.org> wrote:
    >     >     > Absolutely you can find it here:
    >     >     >
    >     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ (for categorical /multi-class)
    >     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 (for categorical/multi-class)
    >     >     >
    >     >     > We can also do similar files where instead of multi-class, we just use pos/neg as the label.
    >     >     >
    >     >     > Cheers,
    >     >     > Chris
    >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
    >     >     >
    >     >     >     Hello Chris,
    >     >     >
    >     >     >     could you please point me to files I can use to train the sentiment
    >     >     >     component? I am currently looking again through the code and would
    >     >     >     like to train it myself.
    >     >     >
    >     >     >     Jörn
    >     >     >
    >     >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <da...@gmail.com> wrote:
    >     >     >     > Hi All,
    >     >     >     >    First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
    >     >     >     >
    >     >     >     >   Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
    >     >     >     >
    >     >     >     >   Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
    >     >     >     >
    >     >     >     > Thank you,
    >     >     >     > Daniel
    >     >     >     >
    >     >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
    >     >     >     >>
    >     >     >     >> Hi everyone,
    >     >     >     >>
    >     >     >     >> I spoke with Joern in Slack. Some of his concerns are:
    >     >     >     >>
    >     >     >     >> 1. This was done with a Merge commit and apparently they squash and rebase.
    >     >     >     >> [would be helpful to see some pointer on this for documentation, thus far I
    >     >     >     >> haven’t found any]
    >     >     >     >> 2. Apparently we literally need to ask others for +1 votes and record them
    >     >     >     >> before committing? I thought since Ana and I are committers aren were +1,
    >     >     >     >> and since Joern had been providing feedback (the last of which was to add
    >     >     >     >> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
    >     >     >     >> formally we need to do a +1 vote even still)
    >     >     >     >> 3. There was concern about scalability of the code.
    >     >     >     >> 4. There are thoughts that the code was not perfect yet (even though it works
    >     >     >     >> fine in the MEMEX project for Ana and I)
    >     >     >     >>
    >     >     >     >> So, Joern has opened up a revert PR.
    >     >     >     >>
    >     >     >     >> I suppose I should state I find this process extremely heavyweight and unwelcoming.
    >     >     >     >> To me, there should be a modicum of trust for committers, but I feel like even as a
    >     >     >     >> committer, I am operating as a “contributor” to the project. Committer means that
    >     >     >     >> there is trust to modify the source code base. Of the issues above, the only one I see
    >     >     >     >> as a moderate snafu was #1, and frankly if there are some instructions that show me
    >     >     >     >> how to do squashing and rebasing *first* I will try to do that in the future since I am
    >     >     >     >> not a GIt expert.
    >     >     >     >>
    >     >     >     >> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC
    >     >     >     >> effort, and we have worked pretty consistently on this over the last year. We used a
    >     >     >     >> separate GitHub project to get started, kept Joern involved as another mentor, even
    >     >     >     >> provided access and commit writes to that GitHub repository for a long time, so this
    >     >     >     >> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
    >     >     >     >> I should have gone and worked on that branch first since master is apparently so
    >     >     >     >> pristine that even an Apache veteran like me can’t get something in to it without
    >     >     >     >> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
    >     >     >     >> “community” issues).
    >     >     >     >>
    >     >     >     >> I am concerned from a community point of view that the first comment wasn’t “Great
    >     >     >     >> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
    >     >     >     >> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
    >     >     >     >>
    >     >     >     >> That’s pretty off-putting to someone who is semi-new like me and like Ana.
    >     >     >     >>
    >     >     >     >> Anyways, go ahead and revert it. Sorry to have caused any issues.
    >     >     >     >>
    >     >     >     >> Chris
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
    >     >     >     >>
    >     >     >     >>    Hi Joern,
    >     >     >     >>
    >     >     >     >>    I’m confused. Why did you revert my commit?
    >     >     >     >>
    >     >     >     >>    Every one of those check points you put on the PR was checked?
    >     >     >     >>    We have been discussing this for months, you have seen the
    >     >     >     >>    code for months, Ana and I have worked diligently on the code
    >     >     >     >>    in plain view of everyone.
    >     >     >     >>
    >     >     >     >>    Please explain.
    >     >     >     >>
    >     >     >     >>    Chris
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
    >     >     >     >>
    >     >     >     >>        GitHub user kottmann opened a pull request:
    >     >     >     >>
    >     >     >     >>            https://github.com/apache/opennlp/pull/238
    >     >     >     >>
    >     >     >     >>            Revert merging of sentiment work, no consent to merge it
    >     >     >     >>
    >     >     >     >>            Thank you for contributing to Apache OpenNLP.
    >     >     >     >>
    >     >     >     >>            In order to streamline the review of the contribution we ask you
    >     >     >     >>            to ensure the following steps have been taken:
    >     >     >     >>
    >     >     >     >>            ### For all changes:
    >     >     >     >>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
    >     >     >     >>                 in the commit message?
    >     >     >     >>
    >     >     >     >>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    >     >     >     >>
    >     >     >     >>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    >     >     >     >>
    >     >     >     >>            - [ ] Is your initial contribution a single, squashed commit?
    >     >     >     >>
    >     >     >     >>            ### For code changes:
    >     >     >     >>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    >     >     >     >>            - [ ] Have you written or updated unit tests to verify your changes?
    >     >     >     >>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    >     >     >     >>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    >     >     >     >>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
    >     >     >     >>
    >     >     >     >>            ### For documentation related changes:
    >     >     >     >>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    >     >     >     >>
    >     >     >     >>            ### Note:
    >     >     >     >>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>        You can merge this pull request into a Git repository by running:
    >     >     >     >>
    >     >     >     >>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
    >     >     >     >>
    >     >     >     >>        Alternatively you can review and apply these changes as the patch at:
    >     >     >     >>
    >     >     >     >>            https://github.com/apache/opennlp/pull/238.patch
    >     >     >     >>
    >     >     >     >>        To close this pull request, make a commit to your master/trunk branch
    >     >     >     >>        with (at least) the following in the commit message:
    >     >     >     >>
    >     >     >     >>            This closes #238
    >     >     >     >>
    >     >     >     >>        ----
    >     >     >     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
    >     >     >     >>        Author: Jörn Kottmann <jo...@apache.org>
    >     >     >     >>        Date:   2017-06-27T08:19:19Z
    >     >     >     >>
    >     >     >     >>            Revert merging of sentiment work, no consent to merge it
    >     >     >     >>
    >     >     >     >>        ----
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>        ---
    >     >     >     >>        If your project is set up for it, you can reply to this email and have your
    >     >     >     >>        reply appear on GitHub as well. If your project does not have this feature
    >     >     >     >>        enabled and wishes so, or if the feature is enabled but not working, please
    >     >     >     >>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
    >     >     >     >>        with INFRA.
    >     >     >     >>        ---
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >
    >     >
    >     >
    >
    >
    >
    



Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Joern Kottmann <ko...@gmail.com>.
For 2. I would like to suggest that we implement doccat format support
to train on that data.

3. it would be best so think about how we want to test the doccat
component, today we don't have any tests which use lots of data to
evaluate it.
Probably the sentitment data could solve this for us and a train and
evaluate test could be included in the eval tests.

+1 to revert and then do these steps after the 1.8.1 release.

I can apply my PR myself if nobody objects.

Jörn

On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <ma...@apache.org> wrote:
> Hi Rodrigo,
>
> This is very useful feedback that I wish we would have had a long time ago.
>
> I will look into it and see if I can reproduce the CLI error. I did a full build and mvn
> install (which I though would run tests?) before commiting and as I posted in JIRA
> the tests passed for me? So I will have to look into that.
>
> That said, given your feedback that SentimentME and the Sentiment Component
> doesn’t offer much over Document Classifier I agree with you, but wasn’t super
> familiar with the Document Classifier API. That said, if we can get the same functionality
> by just using Document Classifier why don’t we:
>
> 1. Remove the SentimentME and associated code (except for the unit tests)
> 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
> build models using Document Classifier API.
> 3. Rename and keep the unit tests that test against Netflix and Stanford tree bank.
>
> That way we get basic sentiment analysis (that is working for us internally at JPL decently),
> for Apache OpenNLP, and then if we want to build something better than a Document
> Classification approach to sentiment we can do so.
>
> Thoughts?
>
> Thanks for your useful feedback. If everyone agrees this is a plan I can back out the code
> using Joern’s revert, and then try and execute 1-3 above in a branch first. Thanks.
>
> Cheers,
> Chris
>
>
>
> On 6/29/17, 10:03 AM, "Rodrigo Agerri" <ra...@apache.org> wrote:
>
>     Hi Chris,
>
>     I have been interested in the new sentiment component for a while,
>     although truth to be told, I did not follow that closely. I have today
>     looked at it and test it with some of the corpora you have mentioned.
>     In order to do that, I checkout master to work with from this commit
>     onwards
>
>     https://github.com/apache/opennlp/commit/56321aab51a470cd2004b76fb1f5330881b943c1
>
>     1. I tried to run it from the CLI. The Sentiment component did not
>     appear to be available.
>     2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no
>     SentimentTool is implemented to tag with a trained model).
>     3. After that, the CLI tests did not pass. So, the CLI is currently
>     non functional, unless I did something wrong, always possible, of
>     course. See if you can reproduce that error.
>
>     I therefore did the tests via API. I implemented a little test for
>     training, evaluating and tagging here:
>
>     https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
>
>     I run the training on the large movies review from Stanford for binary
>     polarity classification
>
>     http://ai.stanford.edu/~amaas/data/sentiment/
>
>     and on the two little samples multiclass files added in resources and
>     mentioned in the previous email, using the first one for training and
>     the second one for testing (maxent 100 iterations, cutoff 5).
>
>     2. Stanford results: 0.84264
>     3. sample multiclass: 0.73
>
>     Given that this is a standard document classification task, I decided
>     to train the doccat component from the CLI:
>
>     1. Stanford results: 0.84264 (BOW features by default).
>     2. sample multiclass: 0.73
>
>     I then looked at the code of the sentiment component and saw that it
>     is basically a document classifier working with bag of words features.
>     No added functionality. So, my conclusions are:
>
>     1. The CLI needs to be fixed.
>     2. The Sentiment component, as it is, provides the same functionality
>     as the document classifier.
>
>     I would therefore reconsider this commit until those two issues are
>     addressed. Just my opinion.
>
>     Best regards,
>
>     Rodrigo
>
>     On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <ma...@apache.org> wrote:
>     >
>     > Hey Joern,
>     >
>     > Sure, you can find the model data links here, along with our evaluation of them.
>     >
>     > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
>     >
>     > There are other evaluations here:
>     >
>     > http://irds.usc.edu/SentimentAnalysisParser/models.html
>     >
>     > The HT provider review I cannot contribute at this time and I question its broad
>     > applicability since it’s related to human trafficking. In addition we are still working
>     > on publishing our analysis & evaluation of it which is why I removed it from the
>     > commit.
>     >
>     > Cheers,
>     > Chris
>     >
>     >
>     >
>     >
>     >
>     > On 6/29/17, 7:36 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
>     >
>     >     Which data sets did you use to evaluate this?
>     >     I was looking for a bit more than a sample file to train it.
>     >
>     >     I noticed that you checked in stanford and netflix models.
>     >
>     >     The stanford data set is probably this one:
>     >     http://ai.stanford.edu/~amaas/data/sentiment/
>     >
>     >     Do you have a link for the netflix data?
>     >
>     >     Jörn
>     >
>     >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <ma...@apache.org> wrote:
>     >     > Absolutely you can find it here:
>     >     >
>     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ (for categorical /multi-class)
>     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 (for categorical/multi-class)
>     >     >
>     >     > We can also do similar files where instead of multi-class, we just use pos/neg as the label.
>     >     >
>     >     > Cheers,
>     >     > Chris
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
>     >     >
>     >     >     Hello Chris,
>     >     >
>     >     >     could you please point me to files I can use to train the sentiment
>     >     >     component? I am currently looking again through the code and would
>     >     >     like to train it myself.
>     >     >
>     >     >     Jörn
>     >     >
>     >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <da...@gmail.com> wrote:
>     >     >     > Hi All,
>     >     >     >    First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
>     >     >     >
>     >     >     >   Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
>     >     >     >
>     >     >     >   Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
>     >     >     >
>     >     >     > Thank you,
>     >     >     > Daniel
>     >     >     >
>     >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
>     >     >     >>
>     >     >     >> Hi everyone,
>     >     >     >>
>     >     >     >> I spoke with Joern in Slack. Some of his concerns are:
>     >     >     >>
>     >     >     >> 1. This was done with a Merge commit and apparently they squash and rebase.
>     >     >     >> [would be helpful to see some pointer on this for documentation, thus far I
>     >     >     >> haven’t found any]
>     >     >     >> 2. Apparently we literally need to ask others for +1 votes and record them
>     >     >     >> before committing? I thought since Ana and I are committers aren were +1,
>     >     >     >> and since Joern had been providing feedback (the last of which was to add
>     >     >     >> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
>     >     >     >> formally we need to do a +1 vote even still)
>     >     >     >> 3. There was concern about scalability of the code.
>     >     >     >> 4. There are thoughts that the code was not perfect yet (even though it works
>     >     >     >> fine in the MEMEX project for Ana and I)
>     >     >     >>
>     >     >     >> So, Joern has opened up a revert PR.
>     >     >     >>
>     >     >     >> I suppose I should state I find this process extremely heavyweight and unwelcoming.
>     >     >     >> To me, there should be a modicum of trust for committers, but I feel like even as a
>     >     >     >> committer, I am operating as a “contributor” to the project. Committer means that
>     >     >     >> there is trust to modify the source code base. Of the issues above, the only one I see
>     >     >     >> as a moderate snafu was #1, and frankly if there are some instructions that show me
>     >     >     >> how to do squashing and rebasing *first* I will try to do that in the future since I am
>     >     >     >> not a GIt expert.
>     >     >     >>
>     >     >     >> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC
>     >     >     >> effort, and we have worked pretty consistently on this over the last year. We used a
>     >     >     >> separate GitHub project to get started, kept Joern involved as another mentor, even
>     >     >     >> provided access and commit writes to that GitHub repository for a long time, so this
>     >     >     >> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
>     >     >     >> I should have gone and worked on that branch first since master is apparently so
>     >     >     >> pristine that even an Apache veteran like me can’t get something in to it without
>     >     >     >> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
>     >     >     >> “community” issues).
>     >     >     >>
>     >     >     >> I am concerned from a community point of view that the first comment wasn’t “Great
>     >     >     >> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
>     >     >     >> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
>     >     >     >>
>     >     >     >> That’s pretty off-putting to someone who is semi-new like me and like Ana.
>     >     >     >>
>     >     >     >> Anyways, go ahead and revert it. Sorry to have caused any issues.
>     >     >     >>
>     >     >     >> Chris
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
>     >     >     >>
>     >     >     >>    Hi Joern,
>     >     >     >>
>     >     >     >>    I’m confused. Why did you revert my commit?
>     >     >     >>
>     >     >     >>    Every one of those check points you put on the PR was checked?
>     >     >     >>    We have been discussing this for months, you have seen the
>     >     >     >>    code for months, Ana and I have worked diligently on the code
>     >     >     >>    in plain view of everyone.
>     >     >     >>
>     >     >     >>    Please explain.
>     >     >     >>
>     >     >     >>    Chris
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
>     >     >     >>
>     >     >     >>        GitHub user kottmann opened a pull request:
>     >     >     >>
>     >     >     >>            https://github.com/apache/opennlp/pull/238
>     >     >     >>
>     >     >     >>            Revert merging of sentiment work, no consent to merge it
>     >     >     >>
>     >     >     >>            Thank you for contributing to Apache OpenNLP.
>     >     >     >>
>     >     >     >>            In order to streamline the review of the contribution we ask you
>     >     >     >>            to ensure the following steps have been taken:
>     >     >     >>
>     >     >     >>            ### For all changes:
>     >     >     >>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
>     >     >     >>                 in the commit message?
>     >     >     >>
>     >     >     >>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
>     >     >     >>
>     >     >     >>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
>     >     >     >>
>     >     >     >>            - [ ] Is your initial contribution a single, squashed commit?
>     >     >     >>
>     >     >     >>            ### For code changes:
>     >     >     >>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
>     >     >     >>            - [ ] Have you written or updated unit tests to verify your changes?
>     >     >     >>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
>     >     >     >>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
>     >     >     >>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
>     >     >     >>
>     >     >     >>            ### For documentation related changes:
>     >     >     >>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
>     >     >     >>
>     >     >     >>            ### Note:
>     >     >     >>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
>     >     >     >>
>     >     >     >>
>     >     >     >>        You can merge this pull request into a Git repository by running:
>     >     >     >>
>     >     >     >>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
>     >     >     >>
>     >     >     >>        Alternatively you can review and apply these changes as the patch at:
>     >     >     >>
>     >     >     >>            https://github.com/apache/opennlp/pull/238.patch
>     >     >     >>
>     >     >     >>        To close this pull request, make a commit to your master/trunk branch
>     >     >     >>        with (at least) the following in the commit message:
>     >     >     >>
>     >     >     >>            This closes #238
>     >     >     >>
>     >     >     >>        ----
>     >     >     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
>     >     >     >>        Author: Jörn Kottmann <jo...@apache.org>
>     >     >     >>        Date:   2017-06-27T08:19:19Z
>     >     >     >>
>     >     >     >>            Revert merging of sentiment work, no consent to merge it
>     >     >     >>
>     >     >     >>        ----
>     >     >     >>
>     >     >     >>
>     >     >     >>        ---
>     >     >     >>        If your project is set up for it, you can reply to this email and have your
>     >     >     >>        reply appear on GitHub as well. If your project does not have this feature
>     >     >     >>        enabled and wishes so, or if the feature is enabled but not working, please
>     >     >     >>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
>     >     >     >>        with INFRA.
>     >     >     >>        ---
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >
>     >     >
>     >     >
>     >     >
>     >
>     >
>     >
>
>
>

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Suneel Marthi <sm...@apache.org>.
On Thu, Jun 29, 2017 at 1:10 PM, Chris Mattmann <ma...@apache.org> wrote:

> Hi Rodrigo,
>
> This is very useful feedback that I wish we would have had a long time ago.
>
> I will look into it and see if I can reproduce the CLI error. I did a full
> build and mvn
> install (which I though would run tests?) before commiting and as I posted
> in JIRA
> the tests passed for me? So I will have to look into that.
>
> That said, given your feedback that SentimentME and the Sentiment Component
> doesn’t offer much over Document Classifier I agree with you, but wasn’t
> super
> familiar with the Document Classifier API. That said, if we can get the
> same functionality
> by just using Document Classifier why don’t we:
>
> 1. Remove the SentimentME and associated code (except for the unit tests)
> 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
> build models using Document Classifier API.
> 3. Rename and keep the unit tests that test against Netflix and Stanford
> tree bank.
>

+1 for the above 3 steps. Let's go ahead with Step 1 today - that way we
can start planning on cutting a 1.8.1 release candidate this weekend.



>
> That way we get basic sentiment analysis (that is working for us
> internally at JPL decently),
> for Apache OpenNLP, and then if we want to build something better than a
> Document
> Classification approach to sentiment we can do so.
>
> Thoughts?
>
> Thanks for your useful feedback. If everyone agrees this is a plan I can
> back out the code
> using Joern’s revert, and then try and execute 1-3 above in a branch
> first. Thanks.
>
> Cheers,
> Chris
>
>
>
> On 6/29/17, 10:03 AM, "Rodrigo Agerri" <ra...@apache.org> wrote:
>
>     Hi Chris,
>
>     I have been interested in the new sentiment component for a while,
>     although truth to be told, I did not follow that closely. I have today
>     looked at it and test it with some of the corpora you have mentioned.
>     In order to do that, I checkout master to work with from this commit
>     onwards
>
>     https://github.com/apache/opennlp/commit/
> 56321aab51a470cd2004b76fb1f5330881b943c1
>
>     1. I tried to run it from the CLI. The Sentiment component did not
>     appear to be available.
>     2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no
>     SentimentTool is implemented to tag with a trained model).
>     3. After that, the CLI tests did not pass. So, the CLI is currently
>     non functional, unless I did something wrong, always possible, of
>     course. See if you can reproduce that error.
>
>     I therefore did the tests via API. I implemented a little test for
>     training, evaluating and tagging here:
>
>     https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
>
>     I run the training on the large movies review from Stanford for binary
>     polarity classification
>
>     http://ai.stanford.edu/~amaas/data/sentiment/
>
>     and on the two little samples multiclass files added in resources and
>     mentioned in the previous email, using the first one for training and
>     the second one for testing (maxent 100 iterations, cutoff 5).
>
>     2. Stanford results: 0.84264
>     3. sample multiclass: 0.73
>
>     Given that this is a standard document classification task, I decided
>     to train the doccat component from the CLI:
>
>     1. Stanford results: 0.84264 (BOW features by default).
>     2. sample multiclass: 0.73
>
>     I then looked at the code of the sentiment component and saw that it
>     is basically a document classifier working with bag of words features.
>     No added functionality. So, my conclusions are:
>
>     1. The CLI needs to be fixed.
>     2. The Sentiment component, as it is, provides the same functionality
>     as the document classifier.
>
>     I would therefore reconsider this commit until those two issues are
>     addressed. Just my opinion.
>
>     Best regards,
>
>     Rodrigo
>
>     On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <ma...@apache.org>
> wrote:
>     >
>     > Hey Joern,
>     >
>     > Sure, you can find the model data links here, along with our
> evaluation of them.
>     >
>     > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
>     >
>     > There are other evaluations here:
>     >
>     > http://irds.usc.edu/SentimentAnalysisParser/models.html
>     >
>     > The HT provider review I cannot contribute at this time and I
> question its broad
>     > applicability since it’s related to human trafficking. In addition
> we are still working
>     > on publishing our analysis & evaluation of it which is why I removed
> it from the
>     > commit.
>     >
>     > Cheers,
>     > Chris
>     >
>     >
>     >
>     >
>     >
>     > On 6/29/17, 7:36 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
>     >
>     >     Which data sets did you use to evaluate this?
>     >     I was looking for a bit more than a sample file to train it.
>     >
>     >     I noticed that you checked in stanford and netflix models.
>     >
>     >     The stanford data set is probably this one:
>     >     http://ai.stanford.edu/~amaas/data/sentiment/
>     >
>     >     Do you have a link for the netflix data?
>     >
>     >     Jörn
>     >
>     >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <
> mattmann@apache.org> wrote:
>     >     > Absolutely you can find it here:
>     >     >
>     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ
> (for categorical /multi-class)
>     >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2
> (for categorical/multi-class)
>     >     >
>     >     > We can also do similar files where instead of multi-class, we
> just use pos/neg as the label.
>     >     >
>     >     > Cheers,
>     >     > Chris
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <ko...@gmail.com>
> wrote:
>     >     >
>     >     >     Hello Chris,
>     >     >
>     >     >     could you please point me to files I can use to train the
> sentiment
>     >     >     component? I am currently looking again through the code
> and would
>     >     >     like to train it myself.
>     >     >
>     >     >     Jörn
>     >     >
>     >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <
> danruss00@gmail.com> wrote:
>     >     >     > Hi All,
>     >     >     >    First, let me take a share of blame for the comment
> Chris mentioned.  I believe I said something like the pull request was X
> revision behind and Y revisions ahead.  It was not meant to be rude, it was
> meant to say it is hard to review code when it is so different from the
> current code base. I am very excited that sentiment analysis is going to be
> added to OpenNLP, but I have not had time to play with it. If I were to say
> “great job” before I have add a chance to look at it, it would be flattery
> not honest praise.
>     >     >     >
>     >     >     >   Let’s clean up the merge.  I agree with Chris that
> scalability and perfection should not be our initial goal.  Let’s get
> something, and we can decide how to optimize later (even if it require a
> complete rewrite).  Perfection is the enemy of the good.
>     >     >     >
>     >     >     >   Finally, because of Chris’ comments it is hard to
> thank Ana and Chris without sounding insincere.  But I’ll try, thank you
> Chris and Ana.  I hope we can get beyond this and that Chris and Ana will
> continue to improve the performance of the sentiment analysis tool and
> happily remain part of the OpenNLP family.  It is also a good time to toss
> a big thank you to all of the committers, users, and PMC member.  I use
> OpenNLP almost everyday.  Your work is extremely valuable to me.
>     >     >     >
>     >     >     > Thank you,
>     >     >     > Daniel
>     >     >     >
>     >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <
> mattmann@apache.org> wrote:
>     >     >     >>
>     >     >     >> Hi everyone,
>     >     >     >>
>     >     >     >> I spoke with Joern in Slack. Some of his concerns are:
>     >     >     >>
>     >     >     >> 1. This was done with a Merge commit and apparently
> they squash and rebase.
>     >     >     >> [would be helpful to see some pointer on this for
> documentation, thus far I
>     >     >     >> haven’t found any]
>     >     >     >> 2. Apparently we literally need to ask others for +1
> votes and record them
>     >     >     >> before committing? I thought since Ana and I are
> committers aren were +1,
>     >     >     >> and since Joern had been providing feedback (the last
> of which was to add
>     >     >     >> tests, which we did) that he would be +1 as well (I
> guess he is not, and I guess
>     >     >     >> formally we need to do a +1 vote even still)
>     >     >     >> 3. There was concern about scalability of the code.
>     >     >     >> 4. There are thoughts that the code was not perfect yet
> (even though it works
>     >     >     >> fine in the MEMEX project for Ana and I)
>     >     >     >>
>     >     >     >> So, Joern has opened up a revert PR.
>     >     >     >>
>     >     >     >> I suppose I should state I find this process extremely
> heavyweight and unwelcoming.
>     >     >     >> To me, there should be a modicum of trust for
> committers, but I feel like even as a
>     >     >     >> committer, I am operating as a “contributor” to the
> project. Committer means that
>     >     >     >> there is trust to modify the source code base. Of the
> issues above, the only one I see
>     >     >     >> as a moderate snafu was #1, and frankly if there are
> some instructions that show me
>     >     >     >> how to do squashing and rebasing *first* I will try to
> do that in the future since I am
>     >     >     >> not a GIt expert.
>     >     >     >>
>     >     >     >> That said, I must state I feel pretty put off by Apache
> OpenNLP. This originated as a GSoC
>     >     >     >> effort, and we have worked pretty consistently on this
> over the last year. We used a
>     >     >     >> separate GitHub project to get started, kept Joern
> involved as another mentor, even
>     >     >     >> provided access and commit writes to that GitHub
> repository for a long time, so this
>     >     >     >> code was developed in the open. Joern even created a
> branch in ApacheOpenNLP in the code and I suppose
>     >     >     >> I should have gone and worked on that branch first
> since master is apparently so
>     >     >     >> pristine that even an Apache veteran like me can’t get
> something in to it without
>     >     >     >> making a whole bunch of (what are IMO minor issues, and
> what are IMO heavyweight
>     >     >     >> “community” issues).
>     >     >     >>
>     >     >     >> I am concerned from a community point of view that the
> first comment wasn’t “Great
>     >     >     >> job Chris, you got Sentiment Analysis into Apache,
> *but* I have these concerns 1-4 above”.
>     >     >     >> It was “The PR was merged wrong in ways 1-4 and I’m
> going to revert it.”
>     >     >     >>
>     >     >     >> That’s pretty off-putting to someone who is semi-new
> like me and like Ana.
>     >     >     >>
>     >     >     >> Anyways, go ahead and revert it. Sorry to have caused
> any issues.
>     >     >     >>
>     >     >     >> Chris
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <
> mattmann@apache.org> wrote:
>     >     >     >>
>     >     >     >>    Hi Joern,
>     >     >     >>
>     >     >     >>    I’m confused. Why did you revert my commit?
>     >     >     >>
>     >     >     >>    Every one of those check points you put on the PR
> was checked?
>     >     >     >>    We have been discussing this for months, you have
> seen the
>     >     >     >>    code for months, Ana and I have worked diligently on
> the code
>     >     >     >>    in plain view of everyone.
>     >     >     >>
>     >     >     >>    Please explain.
>     >     >     >>
>     >     >     >>    Chris
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org>
> wrote:
>     >     >     >>
>     >     >     >>        GitHub user kottmann opened a pull request:
>     >     >     >>
>     >     >     >>            https://github.com/apache/opennlp/pull/238
>     >     >     >>
>     >     >     >>            Revert merging of sentiment work, no consent
> to merge it
>     >     >     >>
>     >     >     >>            Thank you for contributing to Apache OpenNLP.
>     >     >     >>
>     >     >     >>            In order to streamline the review of the
> contribution we ask you
>     >     >     >>            to ensure the following steps have been
> taken:
>     >     >     >>
>     >     >     >>            ### For all changes:
>     >     >     >>            - [ ] Is there a JIRA ticket associated with
> this PR? Is it referenced
>     >     >     >>                 in the commit message?
>     >     >     >>
>     >     >     >>            - [ ] Does your PR title start with
> OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay
> particular attention to the hyphen "-" character.
>     >     >     >>
>     >     >     >>            - [ ] Has your PR been rebased against the
> latest commit within the target branch (typically master)?
>     >     >     >>
>     >     >     >>            - [ ] Is your initial contribution a single,
> squashed commit?
>     >     >     >>
>     >     >     >>            ### For code changes:
>     >     >     >>            - [ ] Have you ensured that the full suite
> of tests is executed via mvn clean install at the root opennlp folder?
>     >     >     >>            - [ ] Have you written or updated unit tests
> to verify your changes?
>     >     >     >>            - [ ] If adding new dependencies to the
> code, are these dependencies licensed in a way that is compatible for
> inclusion under [ASF 2.0](http://www.apache.org/
> legal/resolved.html#category-a)?
>     >     >     >>            - [ ] If applicable, have you updated the
> LICENSE file, including the main LICENSE file in opennlp folder?
>     >     >     >>            - [ ] If applicable, have you updated the
> NOTICE file, including the main NOTICE file found in opennlp folder?
>     >     >     >>
>     >     >     >>            ### For documentation related changes:
>     >     >     >>            - [ ] Have you ensured that format looks
> appropriate for the output in which it is rendered?
>     >     >     >>
>     >     >     >>            ### Note:
>     >     >     >>            Please ensure that once the PR is submitted,
> you check travis-ci for build issues and submit an update to your PR as
> soon as possible.
>     >     >     >>
>     >     >     >>
>     >     >     >>        You can merge this pull request into a Git
> repository by running:
>     >     >     >>
>     >     >     >>            $ git pull https://github.com/kottmann/
> opennlp revert_sentiment
>     >     >     >>
>     >     >     >>        Alternatively you can review and apply these
> changes as the patch at:
>     >     >     >>
>     >     >     >>            https://github.com/apache/
> opennlp/pull/238.patch
>     >     >     >>
>     >     >     >>        To close this pull request, make a commit to
> your master/trunk branch
>     >     >     >>        with (at least) the following in the commit
> message:
>     >     >     >>
>     >     >     >>            This closes #238
>     >     >     >>
>     >     >     >>        ----
>     >     >     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
>     >     >     >>        Author: Jörn Kottmann <jo...@apache.org>
>     >     >     >>        Date:   2017-06-27T08:19:19Z
>     >     >     >>
>     >     >     >>            Revert merging of sentiment work, no consent
> to merge it
>     >     >     >>
>     >     >     >>        ----
>     >     >     >>
>     >     >     >>
>     >     >     >>        ---
>     >     >     >>        If your project is set up for it, you can reply
> to this email and have your
>     >     >     >>        reply appear on GitHub as well. If your project
> does not have this feature
>     >     >     >>        enabled and wishes so, or if the feature is
> enabled but not working, please
>     >     >     >>        contact infrastructure at
> infrastructure@apache.org or file a JIRA ticket
>     >     >     >>        with INFRA.
>     >     >     >>        ---
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >
>     >     >
>     >     >
>     >     >
>     >
>     >
>     >
>
>
>
>

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Chris Mattmann <ma...@apache.org>.
Thanks for your feedback Rodrigo!

Cheers,
Chris




On 6/29/17, 10:14 AM, "Rodrigo Agerri" <ro...@ehu.eus> wrote:

    Hi Chris,
    
    On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <ma...@apache.org> wrote:
    > Hi Rodrigo,
    >
    > This is very useful feedback that I wish we would have had a long time ago.
    >
    > I will look into it and see if I can reproduce the CLI error. I did a full build and mvn
    > install (which I though would run tests?) before commiting and as I posted in JIRA
    > the tests passed for me? So I will have to look into that.
    
    You need to add the tools to the cmdline.CLI, otherwise the tests are
    not triggered for the sentiment component.
    
    https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/cmdline/CLI.java
    
    
    >
    > That said, given your feedback that SentimentME and the Sentiment Component
    > doesn’t offer much over Document Classifier I agree with you, but wasn’t super
    > familiar with the Document Classifier API. That said, if we can get the same functionality
    > by just using Document Classifier why don’t we:
    >
    > 1. Remove the SentimentME and associated code (except for the unit tests)
    > 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
    > build models using Document Classifier API.
    > 3. Rename and keep the unit tests that test against Netflix and Stanford tree bank.
    > That way we get basic sentiment analysis (that is working for us internally at JPL decently),
    > for Apache OpenNLP, and then if we want to build something better than a Document
    > Classification approach to sentiment we can do so.
    >
    > Thoughts?
    
    +1 to implement these three steps.
    
    
    >
    > Thanks for your useful feedback. If everyone agrees this is a plan I can back out the code
    > using Joern’s revert, and then try and execute 1-3 above in a branch first. Thanks.
    
    +1
    
    Cheers,
    
    R
    



Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Rodrigo Agerri <ro...@ehu.eus>.
Hi Chris,

On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <ma...@apache.org> wrote:
> Hi Rodrigo,
>
> This is very useful feedback that I wish we would have had a long time ago.
>
> I will look into it and see if I can reproduce the CLI error. I did a full build and mvn
> install (which I though would run tests?) before commiting and as I posted in JIRA
> the tests passed for me? So I will have to look into that.

You need to add the tools to the cmdline.CLI, otherwise the tests are
not triggered for the sentiment component.

https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/cmdline/CLI.java


>
> That said, given your feedback that SentimentME and the Sentiment Component
> doesn’t offer much over Document Classifier I agree with you, but wasn’t super
> familiar with the Document Classifier API. That said, if we can get the same functionality
> by just using Document Classifier why don’t we:
>
> 1. Remove the SentimentME and associated code (except for the unit tests)
> 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
> build models using Document Classifier API.
> 3. Rename and keep the unit tests that test against Netflix and Stanford tree bank.
> That way we get basic sentiment analysis (that is working for us internally at JPL decently),
> for Apache OpenNLP, and then if we want to build something better than a Document
> Classification approach to sentiment we can do so.
>
> Thoughts?

+1 to implement these three steps.


>
> Thanks for your useful feedback. If everyone agrees this is a plan I can back out the code
> using Joern’s revert, and then try and execute 1-3 above in a branch first. Thanks.

+1

Cheers,

R

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Chris Mattmann <ma...@apache.org>.
Hi Rodrigo,

This is very useful feedback that I wish we would have had a long time ago.

I will look into it and see if I can reproduce the CLI error. I did a full build and mvn 
install (which I though would run tests?) before commiting and as I posted in JIRA 
the tests passed for me? So I will have to look into that.

That said, given your feedback that SentimentME and the Sentiment Component 
doesn’t offer much over Document Classifier I agree with you, but wasn’t super 
familiar with the Document Classifier API. That said, if we can get the same functionality
by just using Document Classifier why don’t we:

1. Remove the SentimentME and associated code (except for the unit tests)
2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
build models using Document Classifier API.
3. Rename and keep the unit tests that test against Netflix and Stanford tree bank.

That way we get basic sentiment analysis (that is working for us internally at JPL decently),
for Apache OpenNLP, and then if we want to build something better than a Document 
Classification approach to sentiment we can do so.

Thoughts?

Thanks for your useful feedback. If everyone agrees this is a plan I can back out the code
using Joern’s revert, and then try and execute 1-3 above in a branch first. Thanks.

Cheers,
Chris



On 6/29/17, 10:03 AM, "Rodrigo Agerri" <ra...@apache.org> wrote:

    Hi Chris,
    
    I have been interested in the new sentiment component for a while,
    although truth to be told, I did not follow that closely. I have today
    looked at it and test it with some of the corpora you have mentioned.
    In order to do that, I checkout master to work with from this commit
    onwards
    
    https://github.com/apache/opennlp/commit/56321aab51a470cd2004b76fb1f5330881b943c1
    
    1. I tried to run it from the CLI. The Sentiment component did not
    appear to be available.
    2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no
    SentimentTool is implemented to tag with a trained model).
    3. After that, the CLI tests did not pass. So, the CLI is currently
    non functional, unless I did something wrong, always possible, of
    course. See if you can reproduce that error.
    
    I therefore did the tests via API. I implemented a little test for
    training, evaluating and tagging here:
    
    https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
    
    I run the training on the large movies review from Stanford for binary
    polarity classification
    
    http://ai.stanford.edu/~amaas/data/sentiment/
    
    and on the two little samples multiclass files added in resources and
    mentioned in the previous email, using the first one for training and
    the second one for testing (maxent 100 iterations, cutoff 5).
    
    2. Stanford results: 0.84264
    3. sample multiclass: 0.73
    
    Given that this is a standard document classification task, I decided
    to train the doccat component from the CLI:
    
    1. Stanford results: 0.84264 (BOW features by default).
    2. sample multiclass: 0.73
    
    I then looked at the code of the sentiment component and saw that it
    is basically a document classifier working with bag of words features.
    No added functionality. So, my conclusions are:
    
    1. The CLI needs to be fixed.
    2. The Sentiment component, as it is, provides the same functionality
    as the document classifier.
    
    I would therefore reconsider this commit until those two issues are
    addressed. Just my opinion.
    
    Best regards,
    
    Rodrigo
    
    On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <ma...@apache.org> wrote:
    >
    > Hey Joern,
    >
    > Sure, you can find the model data links here, along with our evaluation of them.
    >
    > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
    >
    > There are other evaluations here:
    >
    > http://irds.usc.edu/SentimentAnalysisParser/models.html
    >
    > The HT provider review I cannot contribute at this time and I question its broad
    > applicability since it’s related to human trafficking. In addition we are still working
    > on publishing our analysis & evaluation of it which is why I removed it from the
    > commit.
    >
    > Cheers,
    > Chris
    >
    >
    >
    >
    >
    > On 6/29/17, 7:36 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
    >
    >     Which data sets did you use to evaluate this?
    >     I was looking for a bit more than a sample file to train it.
    >
    >     I noticed that you checked in stanford and netflix models.
    >
    >     The stanford data set is probably this one:
    >     http://ai.stanford.edu/~amaas/data/sentiment/
    >
    >     Do you have a link for the netflix data?
    >
    >     Jörn
    >
    >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <ma...@apache.org> wrote:
    >     > Absolutely you can find it here:
    >     >
    >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ (for categorical /multi-class)
    >     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 (for categorical/multi-class)
    >     >
    >     > We can also do similar files where instead of multi-class, we just use pos/neg as the label.
    >     >
    >     > Cheers,
    >     > Chris
    >     >
    >     >
    >     >
    >     >
    >     >
    >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
    >     >
    >     >     Hello Chris,
    >     >
    >     >     could you please point me to files I can use to train the sentiment
    >     >     component? I am currently looking again through the code and would
    >     >     like to train it myself.
    >     >
    >     >     Jörn
    >     >
    >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <da...@gmail.com> wrote:
    >     >     > Hi All,
    >     >     >    First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
    >     >     >
    >     >     >   Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
    >     >     >
    >     >     >   Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
    >     >     >
    >     >     > Thank you,
    >     >     > Daniel
    >     >     >
    >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
    >     >     >>
    >     >     >> Hi everyone,
    >     >     >>
    >     >     >> I spoke with Joern in Slack. Some of his concerns are:
    >     >     >>
    >     >     >> 1. This was done with a Merge commit and apparently they squash and rebase.
    >     >     >> [would be helpful to see some pointer on this for documentation, thus far I
    >     >     >> haven’t found any]
    >     >     >> 2. Apparently we literally need to ask others for +1 votes and record them
    >     >     >> before committing? I thought since Ana and I are committers aren were +1,
    >     >     >> and since Joern had been providing feedback (the last of which was to add
    >     >     >> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
    >     >     >> formally we need to do a +1 vote even still)
    >     >     >> 3. There was concern about scalability of the code.
    >     >     >> 4. There are thoughts that the code was not perfect yet (even though it works
    >     >     >> fine in the MEMEX project for Ana and I)
    >     >     >>
    >     >     >> So, Joern has opened up a revert PR.
    >     >     >>
    >     >     >> I suppose I should state I find this process extremely heavyweight and unwelcoming.
    >     >     >> To me, there should be a modicum of trust for committers, but I feel like even as a
    >     >     >> committer, I am operating as a “contributor” to the project. Committer means that
    >     >     >> there is trust to modify the source code base. Of the issues above, the only one I see
    >     >     >> as a moderate snafu was #1, and frankly if there are some instructions that show me
    >     >     >> how to do squashing and rebasing *first* I will try to do that in the future since I am
    >     >     >> not a GIt expert.
    >     >     >>
    >     >     >> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC
    >     >     >> effort, and we have worked pretty consistently on this over the last year. We used a
    >     >     >> separate GitHub project to get started, kept Joern involved as another mentor, even
    >     >     >> provided access and commit writes to that GitHub repository for a long time, so this
    >     >     >> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
    >     >     >> I should have gone and worked on that branch first since master is apparently so
    >     >     >> pristine that even an Apache veteran like me can’t get something in to it without
    >     >     >> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
    >     >     >> “community” issues).
    >     >     >>
    >     >     >> I am concerned from a community point of view that the first comment wasn’t “Great
    >     >     >> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
    >     >     >> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
    >     >     >>
    >     >     >> That’s pretty off-putting to someone who is semi-new like me and like Ana.
    >     >     >>
    >     >     >> Anyways, go ahead and revert it. Sorry to have caused any issues.
    >     >     >>
    >     >     >> Chris
    >     >     >>
    >     >     >>
    >     >     >>
    >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
    >     >     >>
    >     >     >>    Hi Joern,
    >     >     >>
    >     >     >>    I’m confused. Why did you revert my commit?
    >     >     >>
    >     >     >>    Every one of those check points you put on the PR was checked?
    >     >     >>    We have been discussing this for months, you have seen the
    >     >     >>    code for months, Ana and I have worked diligently on the code
    >     >     >>    in plain view of everyone.
    >     >     >>
    >     >     >>    Please explain.
    >     >     >>
    >     >     >>    Chris
    >     >     >>
    >     >     >>
    >     >     >>
    >     >     >>
    >     >     >>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
    >     >     >>
    >     >     >>        GitHub user kottmann opened a pull request:
    >     >     >>
    >     >     >>            https://github.com/apache/opennlp/pull/238
    >     >     >>
    >     >     >>            Revert merging of sentiment work, no consent to merge it
    >     >     >>
    >     >     >>            Thank you for contributing to Apache OpenNLP.
    >     >     >>
    >     >     >>            In order to streamline the review of the contribution we ask you
    >     >     >>            to ensure the following steps have been taken:
    >     >     >>
    >     >     >>            ### For all changes:
    >     >     >>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
    >     >     >>                 in the commit message?
    >     >     >>
    >     >     >>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    >     >     >>
    >     >     >>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    >     >     >>
    >     >     >>            - [ ] Is your initial contribution a single, squashed commit?
    >     >     >>
    >     >     >>            ### For code changes:
    >     >     >>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    >     >     >>            - [ ] Have you written or updated unit tests to verify your changes?
    >     >     >>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    >     >     >>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    >     >     >>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
    >     >     >>
    >     >     >>            ### For documentation related changes:
    >     >     >>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    >     >     >>
    >     >     >>            ### Note:
    >     >     >>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
    >     >     >>
    >     >     >>
    >     >     >>        You can merge this pull request into a Git repository by running:
    >     >     >>
    >     >     >>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
    >     >     >>
    >     >     >>        Alternatively you can review and apply these changes as the patch at:
    >     >     >>
    >     >     >>            https://github.com/apache/opennlp/pull/238.patch
    >     >     >>
    >     >     >>        To close this pull request, make a commit to your master/trunk branch
    >     >     >>        with (at least) the following in the commit message:
    >     >     >>
    >     >     >>            This closes #238
    >     >     >>
    >     >     >>        ----
    >     >     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
    >     >     >>        Author: Jörn Kottmann <jo...@apache.org>
    >     >     >>        Date:   2017-06-27T08:19:19Z
    >     >     >>
    >     >     >>            Revert merging of sentiment work, no consent to merge it
    >     >     >>
    >     >     >>        ----
    >     >     >>
    >     >     >>
    >     >     >>        ---
    >     >     >>        If your project is set up for it, you can reply to this email and have your
    >     >     >>        reply appear on GitHub as well. If your project does not have this feature
    >     >     >>        enabled and wishes so, or if the feature is enabled but not working, please
    >     >     >>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
    >     >     >>        with INFRA.
    >     >     >>        ---
    >     >     >>
    >     >     >>
    >     >     >>
    >     >     >>
    >     >     >>
    >     >     >>
    >     >     >
    >     >
    >     >
    >     >
    >
    >
    >
    



Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Rodrigo Agerri <ra...@apache.org>.
Hi Chris,

I have been interested in the new sentiment component for a while,
although truth to be told, I did not follow that closely. I have today
looked at it and test it with some of the corpora you have mentioned.
In order to do that, I checkout master to work with from this commit
onwards

https://github.com/apache/opennlp/commit/56321aab51a470cd2004b76fb1f5330881b943c1

1. I tried to run it from the CLI. The Sentiment component did not
appear to be available.
2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no
SentimentTool is implemented to tag with a trained model).
3. After that, the CLI tests did not pass. So, the CLI is currently
non functional, unless I did something wrong, always possible, of
course. See if you can reproduce that error.

I therefore did the tests via API. I implemented a little test for
training, evaluating and tagging here:

https://github.com/ixa-ehu/ixa-pipe-doc/tree/test

I run the training on the large movies review from Stanford for binary
polarity classification

http://ai.stanford.edu/~amaas/data/sentiment/

and on the two little samples multiclass files added in resources and
mentioned in the previous email, using the first one for training and
the second one for testing (maxent 100 iterations, cutoff 5).

2. Stanford results: 0.84264
3. sample multiclass: 0.73

Given that this is a standard document classification task, I decided
to train the doccat component from the CLI:

1. Stanford results: 0.84264 (BOW features by default).
2. sample multiclass: 0.73

I then looked at the code of the sentiment component and saw that it
is basically a document classifier working with bag of words features.
No added functionality. So, my conclusions are:

1. The CLI needs to be fixed.
2. The Sentiment component, as it is, provides the same functionality
as the document classifier.

I would therefore reconsider this commit until those two issues are
addressed. Just my opinion.

Best regards,

Rodrigo

On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <ma...@apache.org> wrote:
>
> Hey Joern,
>
> Sure, you can find the model data links here, along with our evaluation of them.
>
> http://irds.usc.edu/SentimentAnalysisParser/datasets.html
>
> There are other evaluations here:
>
> http://irds.usc.edu/SentimentAnalysisParser/models.html
>
> The HT provider review I cannot contribute at this time and I question its broad
> applicability since it’s related to human trafficking. In addition we are still working
> on publishing our analysis & evaluation of it which is why I removed it from the
> commit.
>
> Cheers,
> Chris
>
>
>
>
>
> On 6/29/17, 7:36 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
>
>     Which data sets did you use to evaluate this?
>     I was looking for a bit more than a sample file to train it.
>
>     I noticed that you checked in stanford and netflix models.
>
>     The stanford data set is probably this one:
>     http://ai.stanford.edu/~amaas/data/sentiment/
>
>     Do you have a link for the netflix data?
>
>     Jörn
>
>     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <ma...@apache.org> wrote:
>     > Absolutely you can find it here:
>     >
>     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ (for categorical /multi-class)
>     > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 (for categorical/multi-class)
>     >
>     > We can also do similar files where instead of multi-class, we just use pos/neg as the label.
>     >
>     > Cheers,
>     > Chris
>     >
>     >
>     >
>     >
>     >
>     > On 6/29/17, 2:35 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
>     >
>     >     Hello Chris,
>     >
>     >     could you please point me to files I can use to train the sentiment
>     >     component? I am currently looking again through the code and would
>     >     like to train it myself.
>     >
>     >     Jörn
>     >
>     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <da...@gmail.com> wrote:
>     >     > Hi All,
>     >     >    First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
>     >     >
>     >     >   Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
>     >     >
>     >     >   Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
>     >     >
>     >     > Thank you,
>     >     > Daniel
>     >     >
>     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
>     >     >>
>     >     >> Hi everyone,
>     >     >>
>     >     >> I spoke with Joern in Slack. Some of his concerns are:
>     >     >>
>     >     >> 1. This was done with a Merge commit and apparently they squash and rebase.
>     >     >> [would be helpful to see some pointer on this for documentation, thus far I
>     >     >> haven’t found any]
>     >     >> 2. Apparently we literally need to ask others for +1 votes and record them
>     >     >> before committing? I thought since Ana and I are committers aren were +1,
>     >     >> and since Joern had been providing feedback (the last of which was to add
>     >     >> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
>     >     >> formally we need to do a +1 vote even still)
>     >     >> 3. There was concern about scalability of the code.
>     >     >> 4. There are thoughts that the code was not perfect yet (even though it works
>     >     >> fine in the MEMEX project for Ana and I)
>     >     >>
>     >     >> So, Joern has opened up a revert PR.
>     >     >>
>     >     >> I suppose I should state I find this process extremely heavyweight and unwelcoming.
>     >     >> To me, there should be a modicum of trust for committers, but I feel like even as a
>     >     >> committer, I am operating as a “contributor” to the project. Committer means that
>     >     >> there is trust to modify the source code base. Of the issues above, the only one I see
>     >     >> as a moderate snafu was #1, and frankly if there are some instructions that show me
>     >     >> how to do squashing and rebasing *first* I will try to do that in the future since I am
>     >     >> not a GIt expert.
>     >     >>
>     >     >> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC
>     >     >> effort, and we have worked pretty consistently on this over the last year. We used a
>     >     >> separate GitHub project to get started, kept Joern involved as another mentor, even
>     >     >> provided access and commit writes to that GitHub repository for a long time, so this
>     >     >> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
>     >     >> I should have gone and worked on that branch first since master is apparently so
>     >     >> pristine that even an Apache veteran like me can’t get something in to it without
>     >     >> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
>     >     >> “community” issues).
>     >     >>
>     >     >> I am concerned from a community point of view that the first comment wasn’t “Great
>     >     >> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
>     >     >> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
>     >     >>
>     >     >> That’s pretty off-putting to someone who is semi-new like me and like Ana.
>     >     >>
>     >     >> Anyways, go ahead and revert it. Sorry to have caused any issues.
>     >     >>
>     >     >> Chris
>     >     >>
>     >     >>
>     >     >>
>     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
>     >     >>
>     >     >>    Hi Joern,
>     >     >>
>     >     >>    I’m confused. Why did you revert my commit?
>     >     >>
>     >     >>    Every one of those check points you put on the PR was checked?
>     >     >>    We have been discussing this for months, you have seen the
>     >     >>    code for months, Ana and I have worked diligently on the code
>     >     >>    in plain view of everyone.
>     >     >>
>     >     >>    Please explain.
>     >     >>
>     >     >>    Chris
>     >     >>
>     >     >>
>     >     >>
>     >     >>
>     >     >>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
>     >     >>
>     >     >>        GitHub user kottmann opened a pull request:
>     >     >>
>     >     >>            https://github.com/apache/opennlp/pull/238
>     >     >>
>     >     >>            Revert merging of sentiment work, no consent to merge it
>     >     >>
>     >     >>            Thank you for contributing to Apache OpenNLP.
>     >     >>
>     >     >>            In order to streamline the review of the contribution we ask you
>     >     >>            to ensure the following steps have been taken:
>     >     >>
>     >     >>            ### For all changes:
>     >     >>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
>     >     >>                 in the commit message?
>     >     >>
>     >     >>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
>     >     >>
>     >     >>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
>     >     >>
>     >     >>            - [ ] Is your initial contribution a single, squashed commit?
>     >     >>
>     >     >>            ### For code changes:
>     >     >>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
>     >     >>            - [ ] Have you written or updated unit tests to verify your changes?
>     >     >>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
>     >     >>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
>     >     >>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
>     >     >>
>     >     >>            ### For documentation related changes:
>     >     >>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
>     >     >>
>     >     >>            ### Note:
>     >     >>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
>     >     >>
>     >     >>
>     >     >>        You can merge this pull request into a Git repository by running:
>     >     >>
>     >     >>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
>     >     >>
>     >     >>        Alternatively you can review and apply these changes as the patch at:
>     >     >>
>     >     >>            https://github.com/apache/opennlp/pull/238.patch
>     >     >>
>     >     >>        To close this pull request, make a commit to your master/trunk branch
>     >     >>        with (at least) the following in the commit message:
>     >     >>
>     >     >>            This closes #238
>     >     >>
>     >     >>        ----
>     >     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
>     >     >>        Author: Jörn Kottmann <jo...@apache.org>
>     >     >>        Date:   2017-06-27T08:19:19Z
>     >     >>
>     >     >>            Revert merging of sentiment work, no consent to merge it
>     >     >>
>     >     >>        ----
>     >     >>
>     >     >>
>     >     >>        ---
>     >     >>        If your project is set up for it, you can reply to this email and have your
>     >     >>        reply appear on GitHub as well. If your project does not have this feature
>     >     >>        enabled and wishes so, or if the feature is enabled but not working, please
>     >     >>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
>     >     >>        with INFRA.
>     >     >>        ---
>     >     >>
>     >     >>
>     >     >>
>     >     >>
>     >     >>
>     >     >>
>     >     >
>     >
>     >
>     >
>
>
>

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Chris Mattmann <ma...@apache.org>.
Hey Joern,

Sure, you can find the model data links here, along with our evaluation of them.

http://irds.usc.edu/SentimentAnalysisParser/datasets.html

There are other evaluations here:

http://irds.usc.edu/SentimentAnalysisParser/models.html

The HT provider review I cannot contribute at this time and I question its broad
applicability since it’s related to human trafficking. In addition we are still working
on publishing our analysis & evaluation of it which is why I removed it from the
commit.

Cheers,
Chris





On 6/29/17, 7:36 AM, "Joern Kottmann" <ko...@gmail.com> wrote:

    Which data sets did you use to evaluate this?
    I was looking for a bit more than a sample file to train it.
    
    I noticed that you checked in stanford and netflix models.
    
    The stanford data set is probably this one:
    http://ai.stanford.edu/~amaas/data/sentiment/
    
    Do you have a link for the netflix data?
    
    Jörn
    
    On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <ma...@apache.org> wrote:
    > Absolutely you can find it here:
    >
    > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ (for categorical /multi-class)
    > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 (for categorical/multi-class)
    >
    > We can also do similar files where instead of multi-class, we just use pos/neg as the label.
    >
    > Cheers,
    > Chris
    >
    >
    >
    >
    >
    > On 6/29/17, 2:35 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
    >
    >     Hello Chris,
    >
    >     could you please point me to files I can use to train the sentiment
    >     component? I am currently looking again through the code and would
    >     like to train it myself.
    >
    >     Jörn
    >
    >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <da...@gmail.com> wrote:
    >     > Hi All,
    >     >    First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
    >     >
    >     >   Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
    >     >
    >     >   Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
    >     >
    >     > Thank you,
    >     > Daniel
    >     >
    >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
    >     >>
    >     >> Hi everyone,
    >     >>
    >     >> I spoke with Joern in Slack. Some of his concerns are:
    >     >>
    >     >> 1. This was done with a Merge commit and apparently they squash and rebase.
    >     >> [would be helpful to see some pointer on this for documentation, thus far I
    >     >> haven’t found any]
    >     >> 2. Apparently we literally need to ask others for +1 votes and record them
    >     >> before committing? I thought since Ana and I are committers aren were +1,
    >     >> and since Joern had been providing feedback (the last of which was to add
    >     >> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
    >     >> formally we need to do a +1 vote even still)
    >     >> 3. There was concern about scalability of the code.
    >     >> 4. There are thoughts that the code was not perfect yet (even though it works
    >     >> fine in the MEMEX project for Ana and I)
    >     >>
    >     >> So, Joern has opened up a revert PR.
    >     >>
    >     >> I suppose I should state I find this process extremely heavyweight and unwelcoming.
    >     >> To me, there should be a modicum of trust for committers, but I feel like even as a
    >     >> committer, I am operating as a “contributor” to the project. Committer means that
    >     >> there is trust to modify the source code base. Of the issues above, the only one I see
    >     >> as a moderate snafu was #1, and frankly if there are some instructions that show me
    >     >> how to do squashing and rebasing *first* I will try to do that in the future since I am
    >     >> not a GIt expert.
    >     >>
    >     >> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC
    >     >> effort, and we have worked pretty consistently on this over the last year. We used a
    >     >> separate GitHub project to get started, kept Joern involved as another mentor, even
    >     >> provided access and commit writes to that GitHub repository for a long time, so this
    >     >> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
    >     >> I should have gone and worked on that branch first since master is apparently so
    >     >> pristine that even an Apache veteran like me can’t get something in to it without
    >     >> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
    >     >> “community” issues).
    >     >>
    >     >> I am concerned from a community point of view that the first comment wasn’t “Great
    >     >> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
    >     >> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
    >     >>
    >     >> That’s pretty off-putting to someone who is semi-new like me and like Ana.
    >     >>
    >     >> Anyways, go ahead and revert it. Sorry to have caused any issues.
    >     >>
    >     >> Chris
    >     >>
    >     >>
    >     >>
    >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
    >     >>
    >     >>    Hi Joern,
    >     >>
    >     >>    I’m confused. Why did you revert my commit?
    >     >>
    >     >>    Every one of those check points you put on the PR was checked?
    >     >>    We have been discussing this for months, you have seen the
    >     >>    code for months, Ana and I have worked diligently on the code
    >     >>    in plain view of everyone.
    >     >>
    >     >>    Please explain.
    >     >>
    >     >>    Chris
    >     >>
    >     >>
    >     >>
    >     >>
    >     >>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
    >     >>
    >     >>        GitHub user kottmann opened a pull request:
    >     >>
    >     >>            https://github.com/apache/opennlp/pull/238
    >     >>
    >     >>            Revert merging of sentiment work, no consent to merge it
    >     >>
    >     >>            Thank you for contributing to Apache OpenNLP.
    >     >>
    >     >>            In order to streamline the review of the contribution we ask you
    >     >>            to ensure the following steps have been taken:
    >     >>
    >     >>            ### For all changes:
    >     >>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
    >     >>                 in the commit message?
    >     >>
    >     >>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    >     >>
    >     >>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    >     >>
    >     >>            - [ ] Is your initial contribution a single, squashed commit?
    >     >>
    >     >>            ### For code changes:
    >     >>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    >     >>            - [ ] Have you written or updated unit tests to verify your changes?
    >     >>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    >     >>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    >     >>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
    >     >>
    >     >>            ### For documentation related changes:
    >     >>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    >     >>
    >     >>            ### Note:
    >     >>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
    >     >>
    >     >>
    >     >>        You can merge this pull request into a Git repository by running:
    >     >>
    >     >>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
    >     >>
    >     >>        Alternatively you can review and apply these changes as the patch at:
    >     >>
    >     >>            https://github.com/apache/opennlp/pull/238.patch
    >     >>
    >     >>        To close this pull request, make a commit to your master/trunk branch
    >     >>        with (at least) the following in the commit message:
    >     >>
    >     >>            This closes #238
    >     >>
    >     >>        ----
    >     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
    >     >>        Author: Jörn Kottmann <jo...@apache.org>
    >     >>        Date:   2017-06-27T08:19:19Z
    >     >>
    >     >>            Revert merging of sentiment work, no consent to merge it
    >     >>
    >     >>        ----
    >     >>
    >     >>
    >     >>        ---
    >     >>        If your project is set up for it, you can reply to this email and have your
    >     >>        reply appear on GitHub as well. If your project does not have this feature
    >     >>        enabled and wishes so, or if the feature is enabled but not working, please
    >     >>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
    >     >>        with INFRA.
    >     >>        ---
    >     >>
    >     >>
    >     >>
    >     >>
    >     >>
    >     >>
    >     >
    >
    >
    >
    



Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Joern Kottmann <ko...@gmail.com>.
Which data sets did you use to evaluate this?
I was looking for a bit more than a sample file to train it.

I noticed that you checked in stanford and netflix models.

The stanford data set is probably this one:
http://ai.stanford.edu/~amaas/data/sentiment/

Do you have a link for the netflix data?

Jörn

On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <ma...@apache.org> wrote:
> Absolutely you can find it here:
>
> opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ (for categorical /multi-class)
> opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 (for categorical/multi-class)
>
> We can also do similar files where instead of multi-class, we just use pos/neg as the label.
>
> Cheers,
> Chris
>
>
>
>
>
> On 6/29/17, 2:35 AM, "Joern Kottmann" <ko...@gmail.com> wrote:
>
>     Hello Chris,
>
>     could you please point me to files I can use to train the sentiment
>     component? I am currently looking again through the code and would
>     like to train it myself.
>
>     Jörn
>
>     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <da...@gmail.com> wrote:
>     > Hi All,
>     >    First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
>     >
>     >   Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
>     >
>     >   Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
>     >
>     > Thank you,
>     > Daniel
>     >
>     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
>     >>
>     >> Hi everyone,
>     >>
>     >> I spoke with Joern in Slack. Some of his concerns are:
>     >>
>     >> 1. This was done with a Merge commit and apparently they squash and rebase.
>     >> [would be helpful to see some pointer on this for documentation, thus far I
>     >> haven’t found any]
>     >> 2. Apparently we literally need to ask others for +1 votes and record them
>     >> before committing? I thought since Ana and I are committers aren were +1,
>     >> and since Joern had been providing feedback (the last of which was to add
>     >> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
>     >> formally we need to do a +1 vote even still)
>     >> 3. There was concern about scalability of the code.
>     >> 4. There are thoughts that the code was not perfect yet (even though it works
>     >> fine in the MEMEX project for Ana and I)
>     >>
>     >> So, Joern has opened up a revert PR.
>     >>
>     >> I suppose I should state I find this process extremely heavyweight and unwelcoming.
>     >> To me, there should be a modicum of trust for committers, but I feel like even as a
>     >> committer, I am operating as a “contributor” to the project. Committer means that
>     >> there is trust to modify the source code base. Of the issues above, the only one I see
>     >> as a moderate snafu was #1, and frankly if there are some instructions that show me
>     >> how to do squashing and rebasing *first* I will try to do that in the future since I am
>     >> not a GIt expert.
>     >>
>     >> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC
>     >> effort, and we have worked pretty consistently on this over the last year. We used a
>     >> separate GitHub project to get started, kept Joern involved as another mentor, even
>     >> provided access and commit writes to that GitHub repository for a long time, so this
>     >> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
>     >> I should have gone and worked on that branch first since master is apparently so
>     >> pristine that even an Apache veteran like me can’t get something in to it without
>     >> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
>     >> “community” issues).
>     >>
>     >> I am concerned from a community point of view that the first comment wasn’t “Great
>     >> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
>     >> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
>     >>
>     >> That’s pretty off-putting to someone who is semi-new like me and like Ana.
>     >>
>     >> Anyways, go ahead and revert it. Sorry to have caused any issues.
>     >>
>     >> Chris
>     >>
>     >>
>     >>
>     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
>     >>
>     >>    Hi Joern,
>     >>
>     >>    I’m confused. Why did you revert my commit?
>     >>
>     >>    Every one of those check points you put on the PR was checked?
>     >>    We have been discussing this for months, you have seen the
>     >>    code for months, Ana and I have worked diligently on the code
>     >>    in plain view of everyone.
>     >>
>     >>    Please explain.
>     >>
>     >>    Chris
>     >>
>     >>
>     >>
>     >>
>     >>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
>     >>
>     >>        GitHub user kottmann opened a pull request:
>     >>
>     >>            https://github.com/apache/opennlp/pull/238
>     >>
>     >>            Revert merging of sentiment work, no consent to merge it
>     >>
>     >>            Thank you for contributing to Apache OpenNLP.
>     >>
>     >>            In order to streamline the review of the contribution we ask you
>     >>            to ensure the following steps have been taken:
>     >>
>     >>            ### For all changes:
>     >>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
>     >>                 in the commit message?
>     >>
>     >>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
>     >>
>     >>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
>     >>
>     >>            - [ ] Is your initial contribution a single, squashed commit?
>     >>
>     >>            ### For code changes:
>     >>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
>     >>            - [ ] Have you written or updated unit tests to verify your changes?
>     >>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
>     >>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
>     >>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
>     >>
>     >>            ### For documentation related changes:
>     >>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
>     >>
>     >>            ### Note:
>     >>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
>     >>
>     >>
>     >>        You can merge this pull request into a Git repository by running:
>     >>
>     >>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
>     >>
>     >>        Alternatively you can review and apply these changes as the patch at:
>     >>
>     >>            https://github.com/apache/opennlp/pull/238.patch
>     >>
>     >>        To close this pull request, make a commit to your master/trunk branch
>     >>        with (at least) the following in the commit message:
>     >>
>     >>            This closes #238
>     >>
>     >>        ----
>     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
>     >>        Author: Jörn Kottmann <jo...@apache.org>
>     >>        Date:   2017-06-27T08:19:19Z
>     >>
>     >>            Revert merging of sentiment work, no consent to merge it
>     >>
>     >>        ----
>     >>
>     >>
>     >>        ---
>     >>        If your project is set up for it, you can reply to this email and have your
>     >>        reply appear on GitHub as well. If your project does not have this feature
>     >>        enabled and wishes so, or if the feature is enabled but not working, please
>     >>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
>     >>        with INFRA.
>     >>        ---
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >
>
>
>

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Chris Mattmann <ma...@apache.org>.
Absolutely you can find it here:

opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ (for categorical /multi-class)
opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 (for categorical/multi-class)

We can also do similar files where instead of multi-class, we just use pos/neg as the label.

Cheers,
Chris





On 6/29/17, 2:35 AM, "Joern Kottmann" <ko...@gmail.com> wrote:

    Hello Chris,
    
    could you please point me to files I can use to train the sentiment
    component? I am currently looking again through the code and would
    like to train it myself.
    
    Jörn
    
    On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <da...@gmail.com> wrote:
    > Hi All,
    >    First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
    >
    >   Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
    >
    >   Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
    >
    > Thank you,
    > Daniel
    >
    >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
    >>
    >> Hi everyone,
    >>
    >> I spoke with Joern in Slack. Some of his concerns are:
    >>
    >> 1. This was done with a Merge commit and apparently they squash and rebase.
    >> [would be helpful to see some pointer on this for documentation, thus far I
    >> haven’t found any]
    >> 2. Apparently we literally need to ask others for +1 votes and record them
    >> before committing? I thought since Ana and I are committers aren were +1,
    >> and since Joern had been providing feedback (the last of which was to add
    >> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
    >> formally we need to do a +1 vote even still)
    >> 3. There was concern about scalability of the code.
    >> 4. There are thoughts that the code was not perfect yet (even though it works
    >> fine in the MEMEX project for Ana and I)
    >>
    >> So, Joern has opened up a revert PR.
    >>
    >> I suppose I should state I find this process extremely heavyweight and unwelcoming.
    >> To me, there should be a modicum of trust for committers, but I feel like even as a
    >> committer, I am operating as a “contributor” to the project. Committer means that
    >> there is trust to modify the source code base. Of the issues above, the only one I see
    >> as a moderate snafu was #1, and frankly if there are some instructions that show me
    >> how to do squashing and rebasing *first* I will try to do that in the future since I am
    >> not a GIt expert.
    >>
    >> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC
    >> effort, and we have worked pretty consistently on this over the last year. We used a
    >> separate GitHub project to get started, kept Joern involved as another mentor, even
    >> provided access and commit writes to that GitHub repository for a long time, so this
    >> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
    >> I should have gone and worked on that branch first since master is apparently so
    >> pristine that even an Apache veteran like me can’t get something in to it without
    >> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
    >> “community” issues).
    >>
    >> I am concerned from a community point of view that the first comment wasn’t “Great
    >> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
    >> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
    >>
    >> That’s pretty off-putting to someone who is semi-new like me and like Ana.
    >>
    >> Anyways, go ahead and revert it. Sorry to have caused any issues.
    >>
    >> Chris
    >>
    >>
    >>
    >> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
    >>
    >>    Hi Joern,
    >>
    >>    I’m confused. Why did you revert my commit?
    >>
    >>    Every one of those check points you put on the PR was checked?
    >>    We have been discussing this for months, you have seen the
    >>    code for months, Ana and I have worked diligently on the code
    >>    in plain view of everyone.
    >>
    >>    Please explain.
    >>
    >>    Chris
    >>
    >>
    >>
    >>
    >>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
    >>
    >>        GitHub user kottmann opened a pull request:
    >>
    >>            https://github.com/apache/opennlp/pull/238
    >>
    >>            Revert merging of sentiment work, no consent to merge it
    >>
    >>            Thank you for contributing to Apache OpenNLP.
    >>
    >>            In order to streamline the review of the contribution we ask you
    >>            to ensure the following steps have been taken:
    >>
    >>            ### For all changes:
    >>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
    >>                 in the commit message?
    >>
    >>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    >>
    >>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    >>
    >>            - [ ] Is your initial contribution a single, squashed commit?
    >>
    >>            ### For code changes:
    >>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    >>            - [ ] Have you written or updated unit tests to verify your changes?
    >>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    >>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    >>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
    >>
    >>            ### For documentation related changes:
    >>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    >>
    >>            ### Note:
    >>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
    >>
    >>
    >>        You can merge this pull request into a Git repository by running:
    >>
    >>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
    >>
    >>        Alternatively you can review and apply these changes as the patch at:
    >>
    >>            https://github.com/apache/opennlp/pull/238.patch
    >>
    >>        To close this pull request, make a commit to your master/trunk branch
    >>        with (at least) the following in the commit message:
    >>
    >>            This closes #238
    >>
    >>        ----
    >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
    >>        Author: Jörn Kottmann <jo...@apache.org>
    >>        Date:   2017-06-27T08:19:19Z
    >>
    >>            Revert merging of sentiment work, no consent to merge it
    >>
    >>        ----
    >>
    >>
    >>        ---
    >>        If your project is set up for it, you can reply to this email and have your
    >>        reply appear on GitHub as well. If your project does not have this feature
    >>        enabled and wishes so, or if the feature is enabled but not working, please
    >>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
    >>        with INFRA.
    >>        ---
    >>
    >>
    >>
    >>
    >>
    >>
    >
    



Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Joern Kottmann <ko...@gmail.com>.
Hello Chris,

could you please point me to files I can use to train the sentiment
component? I am currently looking again through the code and would
like to train it myself.

Jörn

On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <da...@gmail.com> wrote:
> Hi All,
>    First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
>
>   Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
>
>   Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
>
> Thank you,
> Daniel
>
>> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
>>
>> Hi everyone,
>>
>> I spoke with Joern in Slack. Some of his concerns are:
>>
>> 1. This was done with a Merge commit and apparently they squash and rebase.
>> [would be helpful to see some pointer on this for documentation, thus far I
>> haven’t found any]
>> 2. Apparently we literally need to ask others for +1 votes and record them
>> before committing? I thought since Ana and I are committers aren were +1,
>> and since Joern had been providing feedback (the last of which was to add
>> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
>> formally we need to do a +1 vote even still)
>> 3. There was concern about scalability of the code.
>> 4. There are thoughts that the code was not perfect yet (even though it works
>> fine in the MEMEX project for Ana and I)
>>
>> So, Joern has opened up a revert PR.
>>
>> I suppose I should state I find this process extremely heavyweight and unwelcoming.
>> To me, there should be a modicum of trust for committers, but I feel like even as a
>> committer, I am operating as a “contributor” to the project. Committer means that
>> there is trust to modify the source code base. Of the issues above, the only one I see
>> as a moderate snafu was #1, and frankly if there are some instructions that show me
>> how to do squashing and rebasing *first* I will try to do that in the future since I am
>> not a GIt expert.
>>
>> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC
>> effort, and we have worked pretty consistently on this over the last year. We used a
>> separate GitHub project to get started, kept Joern involved as another mentor, even
>> provided access and commit writes to that GitHub repository for a long time, so this
>> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
>> I should have gone and worked on that branch first since master is apparently so
>> pristine that even an Apache veteran like me can’t get something in to it without
>> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
>> “community” issues).
>>
>> I am concerned from a community point of view that the first comment wasn’t “Great
>> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
>> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
>>
>> That’s pretty off-putting to someone who is semi-new like me and like Ana.
>>
>> Anyways, go ahead and revert it. Sorry to have caused any issues.
>>
>> Chris
>>
>>
>>
>> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
>>
>>    Hi Joern,
>>
>>    I’m confused. Why did you revert my commit?
>>
>>    Every one of those check points you put on the PR was checked?
>>    We have been discussing this for months, you have seen the
>>    code for months, Ana and I have worked diligently on the code
>>    in plain view of everyone.
>>
>>    Please explain.
>>
>>    Chris
>>
>>
>>
>>
>>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
>>
>>        GitHub user kottmann opened a pull request:
>>
>>            https://github.com/apache/opennlp/pull/238
>>
>>            Revert merging of sentiment work, no consent to merge it
>>
>>            Thank you for contributing to Apache OpenNLP.
>>
>>            In order to streamline the review of the contribution we ask you
>>            to ensure the following steps have been taken:
>>
>>            ### For all changes:
>>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
>>                 in the commit message?
>>
>>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
>>
>>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
>>
>>            - [ ] Is your initial contribution a single, squashed commit?
>>
>>            ### For code changes:
>>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
>>            - [ ] Have you written or updated unit tests to verify your changes?
>>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
>>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
>>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
>>
>>            ### For documentation related changes:
>>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
>>
>>            ### Note:
>>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
>>
>>
>>        You can merge this pull request into a Git repository by running:
>>
>>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
>>
>>        Alternatively you can review and apply these changes as the patch at:
>>
>>            https://github.com/apache/opennlp/pull/238.patch
>>
>>        To close this pull request, make a commit to your master/trunk branch
>>        with (at least) the following in the commit message:
>>
>>            This closes #238
>>
>>        ----
>>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
>>        Author: Jörn Kottmann <jo...@apache.org>
>>        Date:   2017-06-27T08:19:19Z
>>
>>            Revert merging of sentiment work, no consent to merge it
>>
>>        ----
>>
>>
>>        ---
>>        If your project is set up for it, you can reply to this email and have your
>>        reply appear on GitHub as well. If your project does not have this feature
>>        enabled and wishes so, or if the feature is enabled but not working, please
>>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
>>        with INFRA.
>>        ---
>>
>>
>>
>>
>>
>>
>

Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Dan Russ <da...@gmail.com>.
Hi All,
   First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.

  Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.

  Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.

Thank you,
Daniel

> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
> 
> Hi everyone,
> 
> I spoke with Joern in Slack. Some of his concerns are:
> 
> 1. This was done with a Merge commit and apparently they squash and rebase. 
> [would be helpful to see some pointer on this for documentation, thus far I 
> haven’t found any]
> 2. Apparently we literally need to ask others for +1 votes and record them 
> before committing? I thought since Ana and I are committers aren were +1, 
> and since Joern had been providing feedback (the last of which was to add
> tests, which we did) that he would be +1 as well (I guess he is not, and I guess
> formally we need to do a +1 vote even still)
> 3. There was concern about scalability of the code.
> 4. There are thoughts that the code was not perfect yet (even though it works
> fine in the MEMEX project for Ana and I)
> 
> So, Joern has opened up a revert PR. 
> 
> I suppose I should state I find this process extremely heavyweight and unwelcoming.
> To me, there should be a modicum of trust for committers, but I feel like even as a 
> committer, I am operating as a “contributor” to the project. Committer means that
> there is trust to modify the source code base. Of the issues above, the only one I see
> as a moderate snafu was #1, and frankly if there are some instructions that show me
> how to do squashing and rebasing *first* I will try to do that in the future since I am
> not a GIt expert. 
> 
> That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC 
> effort, and we have worked pretty consistently on this over the last year. We used a
> separate GitHub project to get started, kept Joern involved as another mentor, even
> provided access and commit writes to that GitHub repository for a long time, so this
> code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
> I should have gone and worked on that branch first since master is apparently so 
> pristine that even an Apache veteran like me can’t get something in to it without 
> making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
> “community” issues). 
> 
> I am concerned from a community point of view that the first comment wasn’t “Great
> job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
> It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
> 
> That’s pretty off-putting to someone who is semi-new like me and like Ana.
> 
> Anyways, go ahead and revert it. Sorry to have caused any issues. 
> 
> Chris
> 
> 
> 
> On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
> 
>    Hi Joern,
> 
>    I’m confused. Why did you revert my commit?
> 
>    Every one of those check points you put on the PR was checked?
>    We have been discussing this for months, you have seen the 
>    code for months, Ana and I have worked diligently on the code
>    in plain view of everyone.
> 
>    Please explain.
> 
>    Chris
> 
> 
> 
> 
>    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
> 
>        GitHub user kottmann opened a pull request:
> 
>            https://github.com/apache/opennlp/pull/238
> 
>            Revert merging of sentiment work, no consent to merge it
> 
>            Thank you for contributing to Apache OpenNLP.
> 
>            In order to streamline the review of the contribution we ask you
>            to ensure the following steps have been taken:
> 
>            ### For all changes:
>            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
>                 in the commit message?
> 
>            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
> 
>            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
> 
>            - [ ] Is your initial contribution a single, squashed commit?
> 
>            ### For code changes:
>            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
>            - [ ] Have you written or updated unit tests to verify your changes?
>            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
>            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
>            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
> 
>            ### For documentation related changes:
>            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
> 
>            ### Note:
>            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
> 
> 
>        You can merge this pull request into a Git repository by running:
> 
>            $ git pull https://github.com/kottmann/opennlp revert_sentiment
> 
>        Alternatively you can review and apply these changes as the patch at:
> 
>            https://github.com/apache/opennlp/pull/238.patch
> 
>        To close this pull request, make a commit to your master/trunk branch
>        with (at least) the following in the commit message:
> 
>            This closes #238
> 
>        ----
>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
>        Author: Jörn Kottmann <jo...@apache.org>
>        Date:   2017-06-27T08:19:19Z
> 
>            Revert merging of sentiment work, no consent to merge it
> 
>        ----
> 
> 
>        ---
>        If your project is set up for it, you can reply to this email and have your
>        reply appear on GitHub as well. If your project does not have this feature
>        enabled and wishes so, or if the feature is enabled but not working, please
>        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
>        with INFRA.
>        ---
> 
> 
> 
> 
> 
> 


Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Chris Mattmann <ma...@apache.org>.
Hi everyone,

I spoke with Joern in Slack. Some of his concerns are:

1. This was done with a Merge commit and apparently they squash and rebase. 
[would be helpful to see some pointer on this for documentation, thus far I 
haven’t found any]
2. Apparently we literally need to ask others for +1 votes and record them 
before committing? I thought since Ana and I are committers aren were +1, 
and since Joern had been providing feedback (the last of which was to add
tests, which we did) that he would be +1 as well (I guess he is not, and I guess
formally we need to do a +1 vote even still)
3. There was concern about scalability of the code.
4. There are thoughts that the code was not perfect yet (even though it works
fine in the MEMEX project for Ana and I)

So, Joern has opened up a revert PR. 

I suppose I should state I find this process extremely heavyweight and unwelcoming.
To me, there should be a modicum of trust for committers, but I feel like even as a 
committer, I am operating as a “contributor” to the project. Committer means that
there is trust to modify the source code base. Of the issues above, the only one I see
as a moderate snafu was #1, and frankly if there are some instructions that show me
how to do squashing and rebasing *first* I will try to do that in the future since I am
not a GIt expert. 

That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC 
effort, and we have worked pretty consistently on this over the last year. We used a
separate GitHub project to get started, kept Joern involved as another mentor, even
provided access and commit writes to that GitHub repository for a long time, so this
code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
I should have gone and worked on that branch first since master is apparently so 
pristine that even an Apache veteran like me can’t get something in to it without 
making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
“community” issues). 

I am concerned from a community point of view that the first comment wasn’t “Great
job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”

That’s pretty off-putting to someone who is semi-new like me and like Ana.

Anyways, go ahead and revert it. Sorry to have caused any issues. 

Chris



On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:

    Hi Joern,
    
    I’m confused. Why did you revert my commit?
    
    Every one of those check points you put on the PR was checked?
    We have been discussing this for months, you have seen the 
    code for months, Ana and I have worked diligently on the code
    in plain view of everyone.
    
    Please explain.
    
    Chris
    
    
    
    
    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
    
        GitHub user kottmann opened a pull request:
        
            https://github.com/apache/opennlp/pull/238
        
            Revert merging of sentiment work, no consent to merge it
        
            Thank you for contributing to Apache OpenNLP.
            
            In order to streamline the review of the contribution we ask you
            to ensure the following steps have been taken:
            
            ### For all changes:
            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
                 in the commit message?
            
            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
            
            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
            
            - [ ] Is your initial contribution a single, squashed commit?
            
            ### For code changes:
            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
            - [ ] Have you written or updated unit tests to verify your changes?
            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
            
            ### For documentation related changes:
            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
            
            ### Note:
            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
        
        
        You can merge this pull request into a Git repository by running:
        
            $ git pull https://github.com/kottmann/opennlp revert_sentiment
        
        Alternatively you can review and apply these changes as the patch at:
        
            https://github.com/apache/opennlp/pull/238.patch
        
        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:
        
            This closes #238
            
        ----
        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
        Author: Jörn Kottmann <jo...@apache.org>
        Date:   2017-06-27T08:19:19Z
        
            Revert merging of sentiment work, no consent to merge it
        
        ----
        
        
        ---
        If your project is set up for it, you can reply to this email and have your
        reply appear on GitHub as well. If your project does not have this feature
        enabled and wishes so, or if the feature is enabled but not working, please
        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
        with INFRA.
        ---
        
    
    
    



FW: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...

Posted by Chris Mattmann <ma...@apache.org>.
Hi Joern,

I’m confused. Why did you revert my commit?

Every one of those check points you put on the PR was checked?
We have been discussing this for months, you have seen the 
code for months, Ana and I have worked diligently on the code
in plain view of everyone.

Please explain.

Chris




On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:

    GitHub user kottmann opened a pull request:
    
        https://github.com/apache/opennlp/pull/238
    
        Revert merging of sentiment work, no consent to merge it
    
        Thank you for contributing to Apache OpenNLP.
        
        In order to streamline the review of the contribution we ask you
        to ensure the following steps have been taken:
        
        ### For all changes:
        - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
             in the commit message?
        
        - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
        
        - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
        
        - [ ] Is your initial contribution a single, squashed commit?
        
        ### For code changes:
        - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
        - [ ] Have you written or updated unit tests to verify your changes?
        - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
        - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
        - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
        
        ### For documentation related changes:
        - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
        
        ### Note:
        Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
    
    
    You can merge this pull request into a Git repository by running:
    
        $ git pull https://github.com/kottmann/opennlp revert_sentiment
    
    Alternatively you can review and apply these changes as the patch at:
    
        https://github.com/apache/opennlp/pull/238.patch
    
    To close this pull request, make a commit to your master/trunk branch
    with (at least) the following in the commit message:
    
        This closes #238
        
    ----
    commit 123222eb34724bae793e9d6d22e202c0aee0aa45
    Author: Jörn Kottmann <jo...@apache.org>
    Date:   2017-06-27T08:19:19Z
    
        Revert merging of sentiment work, no consent to merge it
    
    ----
    
    
    ---
    If your project is set up for it, you can reply to this email and have your
    reply appear on GitHub as well. If your project does not have this feature
    enabled and wishes so, or if the feature is enabled but not working, please
    contact infrastructure at infrastructure@apache.org or file a JIRA ticket
    with INFRA.
    ---