You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Chris Mattmann <ma...@apache.org> on 2017/06/27 18:37:27 UTC
Some wisdom around Communities/Process/etc (was Re: [GitHub] opennlp pull request #238: Revert merging of sentiment work, no consent to m...)

Hi Daniel,

Thanks for your message. In actuality I was not really calling out
your specific comment (and FWIW, we addressed it by getting the PR
up to the same revision and not having 1000s of files modified, but
just ~32 or so). So I wasn’t looking for any flattery there per-se.

Please take the below story with a grain of salt. It isn’t meant
to call out anyone in particular, it is meant to impart some knowledge
I have gained having been around Apache and open source since 2004
and a part of many successful projects within this Foundation. I
hope it’s useful to some in thinking about how Apache OpenNLP can
be more welcoming to not just contributors, but even committers,
and others in the project.

I want to tell everyone a story here, regarding barrier to entry.
Quick story. We had a lot of difficulty with this in Apache
Lucene/Solr, in the 2009 era, same with Apache Hadoop and Apache
Nutch. I an emeritus PMC for Lucene, and current PMC for Nutch.
Everyone thought back then that we were in the business of making
pristine software, following strict CM processes (worse than this
IMO), and ensuring software “quality”.  I have a PhD in Software
Engineering from USC so I understand a thing or two about software
quality. And in general about principles of software.  One principle
in software has been generalized to “better, faster, cheaper, *pick
any two*” In short what this is saying is that you can optimize for
two of these properties, but not all of them. This extends in general
to many software engineering “Ilities” and properties. For example
you can try and make sure everyone does the same thing in Git all
the time (a particular workflow), but in doing so, you may detract
from people who don’t have the time to learn Git as extensively,
but know it well enough to get by, and who had something great to
contribute. Back to the story, I’m going to focus on Nutch, but
this indicative of Hadoop, Lucene/Solr and other projects.  So, we
had patches sit in JIRA for YEARS on that project, due to the
oft-cited problem of not having enough “committers” or “PMC” to
review the patches and +1 them before they are committed to our
“pristine” source tree. Well that source tree grew old, surpassed
by other projects, and really unused post Nutch 1.0 in 2009, because
we were so stringent about the way that we committed things and we
focused on process, and not people. I would surmise we had about
4-5 eventual committers & PMC members for Nutch whose patches sat
and sat for years (again I’m not kidding, YEARS) in JIRA b/c we
couldn’t find enough +1s for a stringent Review then Commit (RTC)
process. Eventually in 2011, we managed to elect a new PMC/committer
Julien N., whose first acts were to dislodge the 50+ patches he had
sitting in JIRA and to commit them, during the time when Nutch was
literally on life support at the ASF. I’ve spoken about this at
ApacheCon you can look up my old talks, like this one:

https://s.apache.org/TxJq

If we would have imposed strict process on Julien that caused Nutch
to stagnate, then we would have stifled his innovation and put the
final stake in the heart of Nutch.

Now, back to OpenNLP. The project is doing really well, making more
releases, and there has been a lot of innovation going on. I think
Slack is being used for coordination and so forth, and there is
work being done to ensure that decisions aren’t being made in Slack,
and I know folks have been reminded about that and are being careful
about that, which is fantastic. I consider myself a beginning/moderate
level of expertise in Apache OpenNLP. I don’t know everything about
it, but I am confident I understand the Sentiment Analysis and model
building, and training/cross validation etc. So, perhaps I’m someone
who if I modified the core architectural components of the framework
and messed something up, I could see a revert reaction, or hopefully
someone stepping in and guiding me that that’s probably not something
I should do without X, Y or Z first, etc. But when considering
adding a new module, that has been discussed for a year, that has
a PR associated with it, that in the PR I mentioned several times,
I intend to commit this “soon” or I will commit this after “doing
X tests, etc.” and then to have received the reaction I got – that
my code should be reverted, that we did something wrong etc etc.,
it just….reminded me of the Hadoop, Lucene/Solr and Nutch days. I
think OpenNLP can be as cool and innovative as what’s going on now,
but be a bit more friendly to contributors, and make a few more
exceptions that don’t break things but aren’t as aesthetically
pleasing to a Git aficionado, but that add a new desired capability
like Sentiment Analysis. I just want the community to think about
that a little bit. As someone who wants the project and community
to succeed, and not have to go through the trouble that Hadoop,
Lucene/Solr and Nutch did, I’m just pointing this out as something
to think about.

Finally I’ll state two very important twitter quotes that really
matter to me as an ASF member. These are from former Director Hen
Yandell, and really words I live by:

“Projects begin by thinking they're in the software engineering
business; after a while they realize they're in the recruiting
business.” https://twitter.com/flamefew/statuses/36352411593351168
“More committers, consensus driving, flatter committer/PMC ratios;
all of these are tools in our #1 business of recruitment.”
https://twitter.com/flamefew/statuses/36352484263858176

Just think about it. We’re not here to be the best software engineers
or NLP people in the world. Some people are, but some people, and
I would couch myself in this group, realize that the ultimate goal
here is that we’re in the recruiting business. We want to recruit
people’s contributions that help the project go on and keep living
and keep being relevant. Do any of the PMC want to be here or expect
to be here in 10 years, still working on the same thing? If so,
I’ll shut up. Or in 10 years, do we expect to have the best chance
at recruiting the 5, 6, 7, or 10 people who we could capture of the
100s or 1000s that may end up interacting with the project during
that time?

Just some food for thought. Thanks for reading if you made it all
the way down.

Sincerely,
Chris


On 6/27/17, 7:59 AM, "Dan Russ" <da...@gmail.com> wrote:

    Hi All,
       First, let me take a share of blame for the comment Chris mentioned.  I believe I said something like the pull request was X revision behind and Y revisions ahead.  It was not meant to be rude, it was meant to say it is hard to review code when it is so different from the current code base. I am very excited that sentiment analysis is going to be added to OpenNLP, but I have not had time to play with it. If I were to say “great job” before I have add a chance to look at it, it would be flattery not honest praise.
    
      Let’s clean up the merge.  I agree with Chris that scalability and perfection should not be our initial goal.  Let’s get something, and we can decide how to optimize later (even if it require a complete rewrite).  Perfection is the enemy of the good.
    
      Finally, because of Chris’ comments it is hard to thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris and Ana.  I hope we can get beyond this and that Chris and Ana will continue to improve the performance of the sentiment analysis tool and happily remain part of the OpenNLP family.  It is also a good time to toss a big thank you to all of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your work is extremely valuable to me.
    
    Thank you,
    Daniel
    
    > On Jun 27, 2017, at 10:25 AM, Chris Mattmann <ma...@apache.org> wrote:
    > 
    > Hi everyone,
    > 
    > I spoke with Joern in Slack. Some of his concerns are:
    > 
    > 1. This was done with a Merge commit and apparently they squash and rebase. 
    > [would be helpful to see some pointer on this for documentation, thus far I 
    > haven’t found any]
    > 2. Apparently we literally need to ask others for +1 votes and record them 
    > before committing? I thought since Ana and I are committers aren were +1, 
    > and since Joern had been providing feedback (the last of which was to add
    > tests, which we did) that he would be +1 as well (I guess he is not, and I guess
    > formally we need to do a +1 vote even still)
    > 3. There was concern about scalability of the code.
    > 4. There are thoughts that the code was not perfect yet (even though it works
    > fine in the MEMEX project for Ana and I)
    > 
    > So, Joern has opened up a revert PR. 
    > 
    > I suppose I should state I find this process extremely heavyweight and unwelcoming.
    > To me, there should be a modicum of trust for committers, but I feel like even as a 
    > committer, I am operating as a “contributor” to the project. Committer means that
    > there is trust to modify the source code base. Of the issues above, the only one I see
    > as a moderate snafu was #1, and frankly if there are some instructions that show me
    > how to do squashing and rebasing *first* I will try to do that in the future since I am
    > not a GIt expert. 
    > 
    > That said, I must state I feel pretty put off by Apache OpenNLP. This originated as a GSoC 
    > effort, and we have worked pretty consistently on this over the last year. We used a
    > separate GitHub project to get started, kept Joern involved as another mentor, even
    > provided access and commit writes to that GitHub repository for a long time, so this
    > code was developed in the open. Joern even created a branch in ApacheOpenNLP in the code and I suppose
    > I should have gone and worked on that branch first since master is apparently so 
    > pristine that even an Apache veteran like me can’t get something in to it without 
    > making a whole bunch of (what are IMO minor issues, and what are IMO heavyweight
    > “community” issues). 
    > 
    > I am concerned from a community point of view that the first comment wasn’t “Great
    > job Chris, you got Sentiment Analysis into Apache, *but* I have these concerns 1-4 above”.
    > It was “The PR was merged wrong in ways 1-4 and I’m going to revert it.”
    > 
    > That’s pretty off-putting to someone who is semi-new like me and like Ana.
    > 
    > Anyways, go ahead and revert it. Sorry to have caused any issues. 
    > 
    > Chris
    > 
    > 
    > 
    > On 6/27/17, 7:06 AM, "Chris Mattmann" <ma...@apache.org> wrote:
    > 
    >    Hi Joern,
    > 
    >    I’m confused. Why did you revert my commit?
    > 
    >    Every one of those check points you put on the PR was checked?
    >    We have been discussing this for months, you have seen the 
    >    code for months, Ana and I have worked diligently on the code
    >    in plain view of everyone.
    > 
    >    Please explain.
    > 
    >    Chris
    > 
    > 
    > 
    > 
    >    On 6/27/17, 1:23 AM, "kottmann" <gi...@git.apache.org> wrote:
    > 
    >        GitHub user kottmann opened a pull request:
    > 
    >            https://github.com/apache/opennlp/pull/238
    > 
    >            Revert merging of sentiment work, no consent to merge it
    > 
    >            Thank you for contributing to Apache OpenNLP.
    > 
    >            In order to streamline the review of the contribution we ask you
    >            to ensure the following steps have been taken:
    > 
    >            ### For all changes:
    >            - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
    >                 in the commit message?
    > 
    >            - [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    > 
    >            - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    > 
    >            - [ ] Is your initial contribution a single, squashed commit?
    > 
    >            ### For code changes:
    >            - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    >            - [ ] Have you written or updated unit tests to verify your changes?
    >            - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    >            - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    >            - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?
    > 
    >            ### For documentation related changes:
    >            - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    > 
    >            ### Note:
    >            Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
    > 
    > 
    >        You can merge this pull request into a Git repository by running:
    > 
    >            $ git pull https://github.com/kottmann/opennlp revert_sentiment
    > 
    >        Alternatively you can review and apply these changes as the patch at:
    > 
    >            https://github.com/apache/opennlp/pull/238.patch
    > 
    >        To close this pull request, make a commit to your master/trunk branch
    >        with (at least) the following in the commit message:
    > 
    >            This closes #238
    > 
    >        ----
    >        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
    >        Author: JÃ¶rn Kottmann <jo...@apache.org>
    >        Date:   2017-06-27T08:19:19Z
    > 
    >            Revert merging of sentiment work, no consent to merge it
    > 
    >        ----
    > 
    > 
    >        ---
    >        If your project is set up for it, you can reply to this email and have your
    >        reply appear on GitHub as well. If your project does not have this feature
    >        enabled and wishes so, or if the feature is enabled but not working, please
    >        contact infrastructure at infrastructure@apache.org or file a JIRA ticket
    >        with INFRA.
    >        ---
    > 
    > 
    > 
    > 
    > 
    >