You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nicholas Chammas <ni...@gmail.com> on 2014/08/26 06:02:01 UTC

Handling stale PRs

Check this out:
https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc

We're hitting close to 300 open PRs. Those are the least recently updated
ones.

I think having a low number of stale (i.e. not recently updated) PRs is a
good thing to shoot for. It doesn't leave contributors hanging (which feels
bad for contributors), and reduces project clutter (which feels bad for
maintainers/committers).

What is our approach to tackling this problem?

I think communicating and enforcing a clear policy on how stale PRs are
handled might be a good way to reduce the number of stale PRs we have
without making contributors feel rejected.

I don't know what such a policy would look like, but it should be
enforceable and "lightweight"--i.e. it shouldn't feel like a hammer used to
reject people's work, but rather a necessary tool to keep the project's
contributions relevant and manageable.

Nick

Re: Handling stale PRs

Posted by Madhu <ma...@madhu.com>.
Nicholas Chammas wrote
> Dunno how many committers Discourse has, but it looks like they've managed
> their PRs well. I hope we can do as well in this regard as they have.

Discourse developers appear to  eat their own dog food
<https://meta.discourse.org>  .
Improved collaboration and a shared vision might be a reason for their
success.




-----
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-tp8015p8061.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Handling stale PRs

Posted by Nicholas Chammas <ni...@gmail.com>.
By the way, as a reference point, I just stumbled across the Discourse
GitHub project and their list of pull requests
<https://github.com/discourse/discourse/pulls> looks pretty neat.

~2,200 closed PRs, 6 open. Least recently updated PR dates to 8 days ago.
Project started ~1.5 years ago.

Dunno how many committers Discourse has, but it looks like they've managed
their PRs well. I hope we can do as well in this regard as they have.

Nick


On Tue, Aug 26, 2014 at 2:40 PM, Josh Rosen <ro...@gmail.com> wrote:

> Sure; App Engine supports cron and sending emails.  We can configure the
> app with Spark QA’s credentials in order to allow it to post comments on
> issues, etc.
>
> - Josh
>
> On August 26, 2014 at 11:38:08 AM, Nicholas Chammas (
> nicholas.chammas@gmail.com) wrote:
>
>  OK, that sounds pretty cool.
>
> Josh,
>
> Do you see this app as encompassing or supplanting the functionality I
> described as well?
>
> Nick
>
>
> On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen <ro...@gmail.com> wrote:
>
>>  Last weekend, I started hacking on a Google App Engine app for helping
>> with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).
>>  Some of my basic goals (not all implemented yet):
>>
>>  - Users sign in using GitHub and can browse a list of pull requests,
>> including links to associated JIRAs, Jenkins statuses, a quick preview of
>> the last comment, etc.
>>
>>  - Pull requests are auto-classified based on which components they
>> modify (by looking at the diff).
>>
>>  - From the app’s own internal database of PRs, we can build dashboards
>> to find “abandoned” PRs, graph average time to first review, etc.
>>
>>  - Since we authenticate users with GitHub, we can enable administrative
>> functions via this dashboard (e.g. “assign this PR to me”, “vote to close
>> in the weekly auto-close commit”, etc.
>>
>>  Right now, I’ve implemented GItHub OAuth support and code to update the
>> issues database using the GitHub API.  Because we have access to the full
>> API, it’s pretty easy to do fancy things like parsing the reason for
>> Jenkins failure, etc.  You could even imagine some fancy mashup tools to
>> pull up JIRAs and pull requests side-by in iframes.
>>
>> After I hack on this a bit more, I plan to release a public preview
>> version; if we find this tool useful, I’ll clean it up and open-source the
>> app so folks can contribute to it.
>>
>> - Josh
>>
>> On August 26, 2014 at 8:16:46 AM, Nicholas Chammas (
>> nicholas.chammas@gmail.com) wrote:
>>
>>  On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>>
>> > I'd prefer if we took the approach of politely explaining why in the
>> > current form the patch isn't acceptable and closing it (potentially w/
>> tips
>> > on how to improve it or narrow the scope).
>>
>>
>> Amen to this. Aiming for such a culture would set Spark apart from other
>> projects in a great way.
>>
>> I've proposed several different solutions to ASF infra to streamline the
>> > process, but thus far they haven't been open to any of my ideas:
>>
>>
>> I've added myself as a watcher on those 2 INFRA issues. Sucks that the
>> only
>> solution on offer right now requires basically polluting the commit
>> history.
>>
>> Short of moving Spark's repo to a non-ASF-managed GitHub account, do you
>> think another bot could help us manage the number of stale PRs?
>>
>> I'm thinking a solution as follows might be very helpful:
>>
>> - Extend Spark QA / Jenkins to run on a weekly schedule and check for
>> stale PRs. Let's say a stale PR is an open one that hasn't been updated in
>> N months.
>> - Spark QA maintains a list of known committers on its side.
>> - During its weekly check of stale PRs, Spark QA takes the following
>> action:
>> - If the last person to comment on a PR was a committer, post to the
>> PR asking for an update from the contributor.
>> - If the last person to comment on a PR was a contributor, add the PR
>> to a list. Email this list of *hanging PRs* out to the dev list on a
>> weekly basis and ask committers to update them.
>> - If the last person to comment on a PR was Spark QA asking the
>> contributor to update it, then add the PR to a list. Email this
>> list of *abandoned
>> PRs* to the dev list for the record (or for closing, if that becomes
>> possible in the future).
>>
>> This doesn't solve the problem of not being able to close PRs, but it does
>> help make sure no PR is left hanging for long.
>>
>> What do you think? I'd be interested in implementing this solution if we
>> like it.
>>
>> Nick
>>
>>
>

Re: Handling stale PRs

Posted by Josh Rosen <ro...@gmail.com>.
Sure; App Engine supports cron and sending emails.  We can configure the app with Spark QA’s credentials in order to allow it to post comments on issues, etc.

- Josh

On August 26, 2014 at 11:38:08 AM, Nicholas Chammas (nicholas.chammas@gmail.com) wrote:

OK, that sounds pretty cool.

Josh,

Do you see this app as encompassing or supplanting the functionality I described as well?

Nick


On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen <ro...@gmail.com> wrote:
Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).  Some of my basic goals (not all implemented yet):

- Users sign in using GitHub and can browse a list of pull requests, including links to associated JIRAs, Jenkins statuses, a quick preview of the last comment, etc.

- Pull requests are auto-classified based on which components they modify (by looking at the diff).

- From the app’s own internal database of PRs, we can build dashboards to find “abandoned” PRs, graph average time to first review, etc.

- Since we authenticate users with GitHub, we can enable administrative functions via this dashboard (e.g. “assign this PR to me”, “vote to close in the weekly auto-close commit”, etc.

Right now, I’ve implemented GItHub OAuth support and code to update the issues database using the GitHub API.  Because we have access to the full API, it’s pretty easy to do fancy things like parsing the reason for Jenkins failure, etc.  You could even imagine some fancy mashup tools to pull up JIRAs and pull requests side-by in iframes.

After I hack on this a bit more, I plan to release a public preview version; if we find this tool useful, I’ll clean it up and open-source the app so folks can contribute to it.

- Josh

On August 26, 2014 at 8:16:46 AM, Nicholas Chammas (nicholas.chammas@gmail.com) wrote:

On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell <pw...@gmail.com> wrote:

> I'd prefer if we took the approach of politely explaining why in the
> current form the patch isn't acceptable and closing it (potentially w/ tips
> on how to improve it or narrow the scope).


Amen to this. Aiming for such a culture would set Spark apart from other
projects in a great way.

I've proposed several different solutions to ASF infra to streamline the
> process, but thus far they haven't been open to any of my ideas:


I've added myself as a watcher on those 2 INFRA issues. Sucks that the only
solution on offer right now requires basically polluting the commit history.

Short of moving Spark's repo to a non-ASF-managed GitHub account, do you
think another bot could help us manage the number of stale PRs?

I'm thinking a solution as follows might be very helpful:

- Extend Spark QA / Jenkins to run on a weekly schedule and check for
stale PRs. Let's say a stale PR is an open one that hasn't been updated in
N months.
- Spark QA maintains a list of known committers on its side.
- During its weekly check of stale PRs, Spark QA takes the following
action:
- If the last person to comment on a PR was a committer, post to the
PR asking for an update from the contributor.
- If the last person to comment on a PR was a contributor, add the PR
to a list. Email this list of *hanging PRs* out to the dev list on a
weekly basis and ask committers to update them.
- If the last person to comment on a PR was Spark QA asking the
contributor to update it, then add the PR to a list. Email this
list of *abandoned
PRs* to the dev list for the record (or for closing, if that becomes
possible in the future).

This doesn't solve the problem of not being able to close PRs, but it does
help make sure no PR is left hanging for long.

What do you think? I'd be interested in implementing this solution if we
like it.

Nick


Re: Handling stale PRs

Posted by Nicholas Chammas <ni...@gmail.com>.
OK, that sounds pretty cool.

Josh,

Do you see this app as encompassing or supplanting the functionality I
described as well?

Nick


On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen <ro...@gmail.com> wrote:

> Last weekend, I started hacking on a Google App Engine app for helping
> with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).
>  Some of my basic goals (not all implemented yet):
>
> - Users sign in using GitHub and can browse a list of pull requests,
> including links to associated JIRAs, Jenkins statuses, a quick preview of
> the last comment, etc.
>
> - Pull requests are auto-classified based on which components they modify
> (by looking at the diff).
>
> - From the app’s own internal database of PRs, we can build dashboards to
> find “abandoned” PRs, graph average time to first review, etc.
>
> - Since we authenticate users with GitHub, we can enable administrative
> functions via this dashboard (e.g. “assign this PR to me”, “vote to close
> in the weekly auto-close commit”, etc.
>
> Right now, I’ve implemented GItHub OAuth support and code to update the
> issues database using the GitHub API.  Because we have access to the full
> API, it’s pretty easy to do fancy things like parsing the reason for
> Jenkins failure, etc.  You could even imagine some fancy mashup tools to
> pull up JIRAs and pull requests side-by in iframes.
>
> After I hack on this a bit more, I plan to release a public preview
> version; if we find this tool useful, I’ll clean it up and open-source the
> app so folks can contribute to it.
>
> - Josh
>
> On August 26, 2014 at 8:16:46 AM, Nicholas Chammas (
> nicholas.chammas@gmail.com) wrote:
>
> On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell <pw...@gmail.com>
> wrote:
>
> > I'd prefer if we took the approach of politely explaining why in the
> > current form the patch isn't acceptable and closing it (potentially w/
> tips
> > on how to improve it or narrow the scope).
>
>
> Amen to this. Aiming for such a culture would set Spark apart from other
> projects in a great way.
>
> I've proposed several different solutions to ASF infra to streamline the
> > process, but thus far they haven't been open to any of my ideas:
>
>
> I've added myself as a watcher on those 2 INFRA issues. Sucks that the
> only
> solution on offer right now requires basically polluting the commit
> history.
>
> Short of moving Spark's repo to a non-ASF-managed GitHub account, do you
> think another bot could help us manage the number of stale PRs?
>
> I'm thinking a solution as follows might be very helpful:
>
> - Extend Spark QA / Jenkins to run on a weekly schedule and check for
> stale PRs. Let's say a stale PR is an open one that hasn't been updated in
> N months.
> - Spark QA maintains a list of known committers on its side.
> - During its weekly check of stale PRs, Spark QA takes the following
> action:
> - If the last person to comment on a PR was a committer, post to the
> PR asking for an update from the contributor.
> - If the last person to comment on a PR was a contributor, add the PR
> to a list. Email this list of *hanging PRs* out to the dev list on a
> weekly basis and ask committers to update them.
> - If the last person to comment on a PR was Spark QA asking the
> contributor to update it, then add the PR to a list. Email this
> list of *abandoned
> PRs* to the dev list for the record (or for closing, if that becomes
> possible in the future).
>
> This doesn't solve the problem of not being able to close PRs, but it does
> help make sure no PR is left hanging for long.
>
> What do you think? I'd be interested in implementing this solution if we
> like it.
>
> Nick
>
>

Re: Handling stale PRs

Posted by Nicholas Chammas <ni...@gmail.com>.
Alright! That was quick. :)


On Wed, Aug 27, 2014 at 6:48 PM, Josh Rosen <ro...@gmail.com> wrote:

> I have a very simple dashboard running at http://spark-prs.appspot.com/.
>  Currently, this mirrors the functionality of Patrick’s github-shim, but it
> should be very easy to extend with other features.
>
> The source is at https://github.com/databricks/spark-pr-dashboard (pull
> requests and issues welcome!)
>
> On August 27, 2014 at 2:11:41 PM, Nicholas Chammas (
> nicholas.chammas@gmail.com) wrote:
>
>  On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen <ro...@gmail.com> wrote:
>
>>  Last weekend, I started hacking on a Google App Engine app for helping
>> with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).
>>
>
> BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA
> issue, perhaps?
>
> Nick
>
>

Re: Handling stale PRs

Posted by Josh Rosen <ro...@gmail.com>.
I have a very simple dashboard running at http://spark-prs.appspot.com/.  Currently, this mirrors the functionality of Patrick’s github-shim, but it should be very easy to extend with other features.

The source is at https://github.com/databricks/spark-pr-dashboard (pull requests and issues welcome!)

On August 27, 2014 at 2:11:41 PM, Nicholas Chammas (nicholas.chammas@gmail.com) wrote:

On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen <ro...@gmail.com> wrote:
Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).

BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue, perhaps?

Nick

Re: Handling stale PRs

Posted by Nishkam Ravi <nr...@cloudera.com>.
I see. Yeah, it would be interesting to know if any other project has
considered formalizing this notion. It may also enable assignment of
reviews (potentially automated using Josh's system) and maybe anonymity as
well? On the downside, it isn't easily implemented and probably doesn't
come without certain undesired side-effects.

Thanks,
Nishkam


On Wed, Aug 27, 2014 at 3:39 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Hey Nishkam,
>
> To some extent we already have this process - many community members
> help review patches and some earn a reputation where committer's will
> take an LGTM from them seriously. I'd be interested in seeing if any
> other projects recognize people who do this.
>
> - Patrick
>
> On Wed, Aug 27, 2014 at 2:36 PM, Nishkam Ravi <nr...@cloudera.com> wrote:
> > Wonder if it would make sense to introduce a notion of 'Reviewers' as an
> > intermediate tier to help distribute the load? While anyone can review
> and
> > comment on an open PR, reviewers would be able to say aye or nay subject
> to
> > confirmation by a committer?
> >
> > Thanks,
> > Nishkam
> >
> >
> > On Wed, Aug 27, 2014 at 2:11 PM, Nicholas Chammas
> > <ni...@gmail.com> wrote:
> >>
> >> On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen <ro...@gmail.com>
> wrote:
> >>
> >> > Last weekend, I started hacking on a Google App Engine app for helping
> >> > with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png
> ).
> >> >
> >>
> >> BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA
> >> issue, perhaps?
> >>
> >> Nick
> >
> >
>

Re: Handling stale PRs

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Nishkam,

To some extent we already have this process - many community members
help review patches and some earn a reputation where committer's will
take an LGTM from them seriously. I'd be interested in seeing if any
other projects recognize people who do this.

- Patrick

On Wed, Aug 27, 2014 at 2:36 PM, Nishkam Ravi <nr...@cloudera.com> wrote:
> Wonder if it would make sense to introduce a notion of 'Reviewers' as an
> intermediate tier to help distribute the load? While anyone can review and
> comment on an open PR, reviewers would be able to say aye or nay subject to
> confirmation by a committer?
>
> Thanks,
> Nishkam
>
>
> On Wed, Aug 27, 2014 at 2:11 PM, Nicholas Chammas
> <ni...@gmail.com> wrote:
>>
>> On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen <ro...@gmail.com> wrote:
>>
>> > Last weekend, I started hacking on a Google App Engine app for helping
>> > with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).
>> >
>>
>> BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA
>> issue, perhaps?
>>
>> Nick
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Handling stale PRs

Posted by Nishkam Ravi <nr...@cloudera.com>.
Wonder if it would make sense to introduce a notion of 'Reviewers' as an
intermediate tier to help distribute the load? While anyone can review and
comment on an open PR, reviewers would be able to say aye or nay subject to
confirmation by a committer?

Thanks,
Nishkam


On Wed, Aug 27, 2014 at 2:11 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen <ro...@gmail.com> wrote:
>
> > Last weekend, I started hacking on a Google App Engine app for helping
> > with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).
> >
>
> BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA
> issue, perhaps?
>
> Nick
>

Re: Handling stale PRs

Posted by Nicholas Chammas <ni...@gmail.com>.
On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen <ro...@gmail.com> wrote:

> Last weekend, I started hacking on a Google App Engine app for helping
> with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).
>

BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA
issue, perhaps?

Nick

Re: Handling stale PRs

Posted by Josh Rosen <ro...@gmail.com>.
Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).  Some of my basic goals (not all implemented yet):

- Users sign in using GitHub and can browse a list of pull requests, including links to associated JIRAs, Jenkins statuses, a quick preview of the last comment, etc.

- Pull requests are auto-classified based on which components they modify (by looking at the diff).

- From the app’s own internal database of PRs, we can build dashboards to find “abandoned” PRs, graph average time to first review, etc.

- Since we authenticate users with GitHub, we can enable administrative functions via this dashboard (e.g. “assign this PR to me”, “vote to close in the weekly auto-close commit”, etc.

Right now, I’ve implemented GItHub OAuth support and code to update the issues database using the GitHub API.  Because we have access to the full API, it’s pretty easy to do fancy things like parsing the reason for Jenkins failure, etc.  You could even imagine some fancy mashup tools to pull up JIRAs and pull requests side-by in iframes.

After I hack on this a bit more, I plan to release a public preview version; if we find this tool useful, I’ll clean it up and open-source the app so folks can contribute to it.

- Josh

On August 26, 2014 at 8:16:46 AM, Nicholas Chammas (nicholas.chammas@gmail.com) wrote:

On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell <pw...@gmail.com> wrote:  

> I'd prefer if we took the approach of politely explaining why in the  
> current form the patch isn't acceptable and closing it (potentially w/ tips  
> on how to improve it or narrow the scope).  


Amen to this. Aiming for such a culture would set Spark apart from other  
projects in a great way.  

I've proposed several different solutions to ASF infra to streamline the  
> process, but thus far they haven't been open to any of my ideas:  


I've added myself as a watcher on those 2 INFRA issues. Sucks that the only  
solution on offer right now requires basically polluting the commit history.  

Short of moving Spark's repo to a non-ASF-managed GitHub account, do you  
think another bot could help us manage the number of stale PRs?  

I'm thinking a solution as follows might be very helpful:  

- Extend Spark QA / Jenkins to run on a weekly schedule and check for  
stale PRs. Let's say a stale PR is an open one that hasn't been updated in  
N months.  
- Spark QA maintains a list of known committers on its side.  
- During its weekly check of stale PRs, Spark QA takes the following  
action:  
- If the last person to comment on a PR was a committer, post to the  
PR asking for an update from the contributor.  
- If the last person to comment on a PR was a contributor, add the PR  
to a list. Email this list of *hanging PRs* out to the dev list on a  
weekly basis and ask committers to update them.  
- If the last person to comment on a PR was Spark QA asking the  
contributor to update it, then add the PR to a list. Email this  
list of *abandoned  
PRs* to the dev list for the record (or for closing, if that becomes  
possible in the future).  

This doesn't solve the problem of not being able to close PRs, but it does  
help make sure no PR is left hanging for long.  

What do you think? I'd be interested in implementing this solution if we  
like it.  

Nick  

Re: Handling stale PRs

Posted by Nicholas Chammas <ni...@gmail.com>.
On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell <pw...@gmail.com> wrote:

> I'd prefer if we took the approach of politely explaining why in the
> current form the patch isn't acceptable and closing it (potentially w/ tips
> on how to improve it or narrow the scope).


Amen to this. Aiming for such a culture would set Spark apart from other
projects in a great way.

I've proposed several different solutions to ASF infra to streamline the
> process, but thus far they haven't been open to any of my ideas:


I've added myself as a watcher on those 2 INFRA issues. Sucks that the only
solution on offer right now requires basically polluting the commit history.

Short of moving Spark's repo to a non-ASF-managed GitHub account, do you
think another bot could help us manage the number of stale PRs?

I'm thinking a solution as follows might be very helpful:

   - Extend Spark QA / Jenkins to run on a weekly schedule and check for
   stale PRs. Let's say a stale PR is an open one that hasn't been updated in
   N months.
   - Spark QA maintains a list of known committers on its side.
   - During its weekly check of stale PRs, Spark QA takes the following
   action:
      - If the last person to comment on a PR was a committer, post to the
      PR asking for an update from the contributor.
      - If the last person to comment on a PR was a contributor, add the PR
      to a list. Email this list of *hanging PRs* out to the dev list on a
      weekly basis and ask committers to update them.
      - If the last person to comment on a PR was Spark QA asking the
      contributor to update it, then add the PR to a list. Email this
list of *abandoned
      PRs* to the dev list for the record (or for closing, if that becomes
      possible in the future).

This doesn't solve the problem of not being able to close PRs, but it does
help make sure no PR is left hanging for long.

What do you think? I'd be interested in implementing this solution if we
like it.

Nick

Re: Handling stale PRs

Posted by Nicholas Chammas <ni...@gmail.com>.
Things that help:

   - Be persistent. People are busy, so just ping them if there’s been no
   response for a couple of weeks. Hopefully, as the project continues to
   develop, this will become less necessary.
   - Only ping reviewers after test results are back from Jenkins. Make
   sure all the tests are clear before reaching out, unless you need help
   understanding why a test is failing.
   - Whenever possible, keep PRs small, small, small.
   - Get buy-in on the dev list before working on something, especially
   larger features, to make sure you are making something that people
   understand and that is in accordance with Spark’s design.

I’m just speaking as a random contributor here, so don’t take this advice
as gospel.

Nick
​

On Mon Dec 08 2014 at 3:08:02 PM Ganelin, Ilya <Il...@capitalone.com>
wrote:

> Thank you for pointing this out, Nick. I know that for myself and my
> colleague who are starting to contribute to Spark, it¹s definitely
> discouraging to have fixes sitting in the pipeline. Could you recommend
> any other ways that we can facilitate getting these PRs accepted? Clean,
> well-tested code is an obvious one but I¹d like to know if there are some
> non-obvious things we (as contributors) could do to make the committers¹
> lives easier? Thanks!
>
> -Ilya
>
> On 12/8/14, 11:58 AM, "Nicholas Chammas" <ni...@gmail.com>
> wrote:
>
> >I recently came across this blog post, which reminded me of this thread.
> >
> >How to Discourage Open Source Contributions
> ><http://danluu.com/discourage-oss/>
> >
> >We are currently at 320+ open PRs, many of which haven't been updated in
> >over a month. We have quite a few PRs that haven't been touched in 3-5
> >months.
> >
> >*If you have the time and interest, please hop on over to the Spark PR
> >Dashboard <https://spark-prs.appspot.com/>, sort the PRs by
> >least-recently-updated, and update them where you can.*
> >
> >I share the blog author's opinion that letting PRs go stale discourages
> >contributions, especially from first-time contributors, and especially
> >more
> >so when the PR author is waiting on feedback from a committer or
> >contributor.
> >
> >I've been thinking about simple ways to make it easier for all of us to
> >chip in on controlling stale PRs in an incremental way. For starters,
> >would
> >it help if an automated email went out to the dev list once a week that a)
> >reported the number of stale PRs, and b) directly linked to the 5 least
> >recently updated PRs?
> >
> >Nick
> >
> >On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas <
> >nicholas.chammas@gmail.com> wrote:
> >
> >> On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell <pw...@gmail.com>
> >> wrote:
> >>
> >>> it's actually precedurally difficult for us to close pull requests
> >>
> >>
> >> Just an FYI: Seems like the GitHub-sanctioned work-around to having
> >> issues-only permissions is to have a second, issues-only repository
> >> <https://help.github.com/articles/issues-only-access-permissions>. Not
> a
> >> very attractive work-around...
> >>
> >> Nick
> >>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed.  If the reader of this message is not the
> intended recipient, you are hereby notified that any review,
> retransmission, dissemination, distribution, copying or other use of, or
> taking of any action in reliance upon this information is strictly
> prohibited. If you have received this communication in error, please
> contact the sender and delete the material from your computer.
>
>

Re: Handling stale PRs

Posted by "Ganelin, Ilya" <Il...@capitalone.com>.
Thank you for pointing this out, Nick. I know that for myself and my
colleague who are starting to contribute to Spark, it¹s definitely
discouraging to have fixes sitting in the pipeline. Could you recommend
any other ways that we can facilitate getting these PRs accepted? Clean,
well-tested code is an obvious one but I¹d like to know if there are some
non-obvious things we (as contributors) could do to make the committers¹
lives easier? Thanks!

-Ilya 

On 12/8/14, 11:58 AM, "Nicholas Chammas" <ni...@gmail.com>
wrote:

>I recently came across this blog post, which reminded me of this thread.
>
>How to Discourage Open Source Contributions
><http://danluu.com/discourage-oss/>
>
>We are currently at 320+ open PRs, many of which haven't been updated in
>over a month. We have quite a few PRs that haven't been touched in 3-5
>months.
>
>*If you have the time and interest, please hop on over to the Spark PR
>Dashboard <https://spark-prs.appspot.com/>, sort the PRs by
>least-recently-updated, and update them where you can.*
>
>I share the blog author's opinion that letting PRs go stale discourages
>contributions, especially from first-time contributors, and especially
>more
>so when the PR author is waiting on feedback from a committer or
>contributor.
>
>I've been thinking about simple ways to make it easier for all of us to
>chip in on controlling stale PRs in an incremental way. For starters,
>would
>it help if an automated email went out to the dev list once a week that a)
>reported the number of stale PRs, and b) directly linked to the 5 least
>recently updated PRs?
>
>Nick
>
>On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas <
>nicholas.chammas@gmail.com> wrote:
>
>> On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>>
>>> it's actually precedurally difficult for us to close pull requests
>>
>>
>> Just an FYI: Seems like the GitHub-sanctioned work-around to having
>> issues-only permissions is to have a second, issues-only repository
>> <https://help.github.com/articles/issues-only-access-permissions>. Not a
>> very attractive work-around...
>>
>> Nick
>>

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed.  If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Handling stale PRs

Posted by Nicholas Chammas <ni...@gmail.com>.
Shout-out to Michael and other Spark SQL contributors for really trimming
down the number of open/stale Spark SQL PRs
<https://spark-prs.appspot.com/#sql>.

As of right now, the least recently updated open Spark SQL PR goes back
only 11 days.

Nice work!

Nick


On Mon Dec 08 2014 at 2:58:08 PM Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> I recently came across this blog post, which reminded me of this thread.
>
> How to Discourage Open Source Contributions
> <http://danluu.com/discourage-oss/>
>
> We are currently at 320+ open PRs, many of which haven't been updated in
> over a month. We have quite a few PRs that haven't been touched in 3-5
> months.
>
> *If you have the time and interest, please hop on over to the Spark PR
> Dashboard <https://spark-prs.appspot.com/>, sort the PRs by
> least-recently-updated, and update them where you can.*
>
> I share the blog author's opinion that letting PRs go stale discourages
> contributions, especially from first-time contributors, and especially more
> so when the PR author is waiting on feedback from a committer or
> contributor.
>
> I've been thinking about simple ways to make it easier for all of us to
> chip in on controlling stale PRs in an incremental way. For starters, would
> it help if an automated email went out to the dev list once a week that a)
> reported the number of stale PRs, and b) directly linked to the 5 least
> recently updated PRs?
>
> Nick
>
> On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>>
>>> it's actually precedurally difficult for us to close pull requests
>>
>>
>> Just an FYI: Seems like the GitHub-sanctioned work-around to having
>> issues-only permissions is to have a second, issues-only repository
>> <https://help.github.com/articles/issues-only-access-permissions>. Not a
>> very attractive work-around...
>>
>> Nick
>>
>

Re: Handling stale PRs

Posted by Nicholas Chammas <ni...@gmail.com>.
I recently came across this blog post, which reminded me of this thread.

How to Discourage Open Source Contributions
<http://danluu.com/discourage-oss/>

We are currently at 320+ open PRs, many of which haven't been updated in
over a month. We have quite a few PRs that haven't been touched in 3-5
months.

*If you have the time and interest, please hop on over to the Spark PR
Dashboard <https://spark-prs.appspot.com/>, sort the PRs by
least-recently-updated, and update them where you can.*

I share the blog author's opinion that letting PRs go stale discourages
contributions, especially from first-time contributors, and especially more
so when the PR author is waiting on feedback from a committer or
contributor.

I've been thinking about simple ways to make it easier for all of us to
chip in on controlling stale PRs in an incremental way. For starters, would
it help if an automated email went out to the dev list once a week that a)
reported the number of stale PRs, and b) directly linked to the 5 least
recently updated PRs?

Nick

On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell <pw...@gmail.com>
> wrote:
>
>> it's actually precedurally difficult for us to close pull requests
>
>
> Just an FYI: Seems like the GitHub-sanctioned work-around to having
> issues-only permissions is to have a second, issues-only repository
> <https://help.github.com/articles/issues-only-access-permissions>. Not a
> very attractive work-around...
>
> Nick
>

Re: Handling stale PRs

Posted by Nicholas Chammas <ni...@gmail.com>.
On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell <pw...@gmail.com> wrote:

> it's actually precedurally difficult for us to close pull requests


Just an FYI: Seems like the GitHub-sanctioned work-around to having
issues-only permissions is to have a second, issues-only repository
<https://help.github.com/articles/issues-only-access-permissions>. Not a
very attractive work-around...

Nick

Re: Handling stale PRs

Posted by Madhu <ma...@madhu.com>.
Sean Owen wrote
> Stale JIRAs are a symptom, not a problem per se. I also want to see
> the backlog cleared, but automatically closing doesn't help, if the
> problem is too many JIRAs and not enough committer-hours to look at
> them. Some noise gets closed, but some easy or important fixes may
> disappear as well.

Agreed. All of the problems mentioned in this thread are symptoms. There's
no shortage of talent and enthusiasm within the Spark community. The people
and the product are wonderful. The process: not so much. Spark has been
wildly successful, some growing pains are to be expected.

Given 100+ contributors, Spark is a big project. As with big data, big
projects can run into scaling issues. There's no magic to running a
successful big project, but it does require greater planning and discipline.
JIRA is great for issue tracking, but it's not a replacement for a project
plan. Quarterly releases are a great idea, everyone knows the schedule. What
we need is concise plan for each release with a clear scope statement.
Without knowing what is in scope and out of scope for a release, we end up
with a laundry list of things to do, but no clear goal. Laundry lists don't
scale well.

I don't mind helping with planning and documenting releases. This is
especially helpful for new contributors who don't know where to start. I
have done that successfully on many projects using Jira and Confluence, so I
know it can be done. To address immediate concerns of open PRs and
excessive, overlapping Jira issues, we probably have to create a meta issue
and assign resources to fix it. I don't mind helping with that also.



-----
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-tp8015p8031.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Handling stale PRs

Posted by Erik Erlandson <ej...@redhat.com>.

----- Original Message -----

> >> Another thing is that we should, IMO, err on the side of explicitly saying
> >> "no" or "not yet" to patches, rather than letting them linger without
> >> attention. We do get patches where the user is well intentioned, but it is
> >
> > Completely agree. The solution is partly more supply of committer time
> > on JIRAs. But that is detracting from the work the committers
> > themselves want to do. More of the solution is reducing demand by
> > helping people create useful, actionable, non-duplicate JIRAs from the
> > start. Or encouraging people to resolve existing JIRAs and shepherding
> > those in.
> 
> saying no/not-yet is a vitally important piece of information.

+1, when I propose a contribution to a project, I consider an articulate (and hopefully polite) "no thanks", or "not-yet", or "needs-work", to be far more useful and pleasing than just radio silence.  It allows me to either address feedback, or just move on.

Although it takes effort to keep abreast of community contributions, I don't think it needs to be an overbearing or heavy-weight process.  I've seen other communities where they talked themselves out of better management because they conceived the ticket workflow as being more effort than it needed to be.  Much better to keep ticket triage and workflow fast/simple, but actually do it.



> 
> 
> > Elsewhere, I've found people reluctant to close JIRA for fear of
> > offending or turning off contributors. I think the opposite is true.
> > There is nothing wrong with "no" or "not now" especially accompanied
> > with constructive feedback. Better to state for the record what is not
> > being looked at and why, than let people work on and open the same
> > JIRAs repeatedly.
> 
> well stated!
> 
> 
> > I have also found in the past that a culture of tolerating eternal
> > JIRAs led people to file JIRAs in order to forget about a problem --
> > it's in JIRA, and it's "in progress", so it feels like someone else is
> > going to fix it later and so it can be forgotten now.
> 
> there's some value in these now-i-can-forget jira, though i'm not
> personally a fan. it can be good to keep them around and reachable by
> search, but they should be clearly marked as no/not-yet or something
> similar.
> 
> 
> > For what it's worth, I think these project and culture mechanics are
> > so important and it's my #1 concern for Spark at this stage. This
> > challenge exists so much more here, exactly because there is so much
> > potential. I'd love to help by trying to identify and close stale
> > JIRAs but am afraid that tagging them is just adding to the heap of
> > work.
> 
> +1 concern and potential!
> 
> 
> best,
> 
> 
> matt
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Handling stale PRs

Posted by Matthew Farrellee <ma...@redhat.com>.
On 08/26/2014 04:57 AM, Sean Owen wrote:
> On Tue, Aug 26, 2014 at 7:02 AM, Patrick Wendell <pw...@gmail.com> wrote:
>> Most other ASF projects I know just ignore these patches. I'd prefer if we
>
> Agree, this drives me crazy. It kills part of JIRA's usefulness. Spark
> is blessed/cursed with incredible inbound load, but would love to
> still see the project get this right-er than, say, Hadoop.

totally agree, this applies to patches as well as jiras. i'll add that 
projects who let things simply linger are missing an opportunity to 
engage their community.

spark should capitalize on its momentum to build a smoothly running 
community (vs not and accept an unbounded backlog as inevitable).


>> The more important thing, maybe, is how we want to deal with this
>> culturally. And I think we need to do a better job of making sure no pull
>> requests go unattended (i.e. waiting for committer feedback). If patches go
>> stale, it should be because the user hasn't responded, not us.
>
> Stale JIRAs are a symptom, not a problem per se. I also want to see
> the backlog cleared, but automatically closing doesn't help, if the
> problem is too many JIRAs and not enough committer-hours to look at
> them. Some noise gets closed, but some easy or important fixes may
> disappear as well.

engagement in the community really needs to go both ways. it's 
reasonable for PRs that stop merging or have open comments that need 
resolution by the PRer to be loudly timed out. a similar thing goes for 
jiras, if there's a request for more information to resolve a bug and 
that information does not appear, half of the communication is gone and 
a loud time out is reasonable.

easy and important are in the eyes of the beholder. timeouts can go both 
ways. a jira or pr that has been around for a period of time (say 1/3 
the to-close timeout) should bump up for evaluation, hopefully resulting 
in few "easy" or "important" issues falling through the cracks.

fyi, i'm periodically going through the pyspark jiras, trying to 
reproduce issues, coalesce duplicates and ask for details. i've not been 
given any sort of permission to do this, i don't have any special 
position in the community to do this - in a well functioning community 
everyone should feel free to jump in and help.


>> Another thing is that we should, IMO, err on the side of explicitly saying
>> "no" or "not yet" to patches, rather than letting them linger without
>> attention. We do get patches where the user is well intentioned, but it is
>
> Completely agree. The solution is partly more supply of committer time
> on JIRAs. But that is detracting from the work the committers
> themselves want to do. More of the solution is reducing demand by
> helping people create useful, actionable, non-duplicate JIRAs from the
> start. Or encouraging people to resolve existing JIRAs and shepherding
> those in.

saying no/not-yet is a vitally important piece of information.


> Elsewhere, I've found people reluctant to close JIRA for fear of
> offending or turning off contributors. I think the opposite is true.
> There is nothing wrong with "no" or "not now" especially accompanied
> with constructive feedback. Better to state for the record what is not
> being looked at and why, than let people work on and open the same
> JIRAs repeatedly.

well stated!


> I have also found in the past that a culture of tolerating eternal
> JIRAs led people to file JIRAs in order to forget about a problem --
> it's in JIRA, and it's "in progress", so it feels like someone else is
> going to fix it later and so it can be forgotten now.

there's some value in these now-i-can-forget jira, though i'm not 
personally a fan. it can be good to keep them around and reachable by 
search, but they should be clearly marked as no/not-yet or something 
similar.


> For what it's worth, I think these project and culture mechanics are
> so important and it's my #1 concern for Spark at this stage. This
> challenge exists so much more here, exactly because there is so much
> potential. I'd love to help by trying to identify and close stale
> JIRAs but am afraid that tagging them is just adding to the heap of
> work.

+1 concern and potential!


best,


matt

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Handling stale PRs

Posted by Sean Owen <so...@cloudera.com>.
On Tue, Aug 26, 2014 at 7:02 AM, Patrick Wendell <pw...@gmail.com> wrote:
> Most other ASF projects I know just ignore these patches. I'd prefer if we

Agree, this drives me crazy. It kills part of JIRA's usefulness. Spark
is blessed/cursed with incredible inbound load, but would love to
still see the project get this right-er than, say, Hadoop.

> The more important thing, maybe, is how we want to deal with this
> culturally. And I think we need to do a better job of making sure no pull
> requests go unattended (i.e. waiting for committer feedback). If patches go
> stale, it should be because the user hasn't responded, not us.

Stale JIRAs are a symptom, not a problem per se. I also want to see
the backlog cleared, but automatically closing doesn't help, if the
problem is too many JIRAs and not enough committer-hours to look at
them. Some noise gets closed, but some easy or important fixes may
disappear as well.

> Another thing is that we should, IMO, err on the side of explicitly saying
> "no" or "not yet" to patches, rather than letting them linger without
> attention. We do get patches where the user is well intentioned, but it is

Completely agree. The solution is partly more supply of committer time
on JIRAs. But that is detracting from the work the committers
themselves want to do. More of the solution is reducing demand by
helping people create useful, actionable, non-duplicate JIRAs from the
start. Or encouraging people to resolve existing JIRAs and shepherding
those in.

Elsewhere, I've found people reluctant to close JIRA for fear of
offending or turning off contributors. I think the opposite is true.
There is nothing wrong with "no" or "not now" especially accompanied
with constructive feedback. Better to state for the record what is not
being looked at and why, than let people work on and open the same
JIRAs repeatedly.

I have also found in the past that a culture of tolerating eternal
JIRAs led people to file JIRAs in order to forget about a problem --
it's in JIRA, and it's "in progress", so it feels like someone else is
going to fix it later and so it can be forgotten now.

For what it's worth, I think these project and culture mechanics are
so important and it's my #1 concern for Spark at this stage. This
challenge exists so much more here, exactly because there is so much
potential. I'd love to help by trying to identify and close stale
JIRAs but am afraid that tagging them is just adding to the heap of
work.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Handling stale PRs

Posted by Patrick Wendell <pw...@gmail.com>.
Hey Nicholas,

Thanks for bringing this up. There are a few dimensions to this... one is
that it's actually precedurally difficult for us to close pull requests.
I've proposed several different solutions to ASF infra to streamline the
process, but thus far they haven't been open to any of my ideas:

https://issues.apache.org/jira/browse/INFRA-7918
https://issues.apache.org/jira/browse/INFRA-8241

The more important thing, maybe, is how we want to deal with this
culturally. And I think we need to do a better job of making sure no pull
requests go unattended (i.e. waiting for committer feedback). If patches go
stale, it should be because the user hasn't responded, not us.

Another thing is that we should, IMO, err on the side of explicitly saying
"no" or "not yet" to patches, rather than letting them linger without
attention. We do get patches where the user is well intentioned, but it is
a feature that doesn't make sense to add, or isn't well thought out or
explained, or the review effort would be so large it's not within our
capacity to look at just yet.

Most other ASF projects I know just ignore these patches. I'd prefer if we
took the approach of politely explaining why in the current form the patch
isn't acceptable and closing it (potentially w/ tips on how to improve it
or narrow the scope).

- Patrick




On Mon, Aug 25, 2014 at 9:57 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> Hey Nicholas,
>
> In general we've been looking at these periodically (at least I have) and
> asking people to close out of date ones, but it's true that the list has
> gotten fairly large. We should probably have an expiry time of a few months
> and close them automatically. I agree that it's daunting to see so many
> open PRs.
>
> Matei
>
> On August 25, 2014 at 9:03:09 PM, Nicholas Chammas (
> nicholas.chammas@gmail.com) wrote:
>
> Check this out:
>
> https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc
>
> We're hitting close to 300 open PRs. Those are the least recently updated
> ones.
>
> I think having a low number of stale (i.e. not recently updated) PRs is a
> good thing to shoot for. It doesn't leave contributors hanging (which feels
> bad for contributors), and reduces project clutter (which feels bad for
> maintainers/committers).
>
> What is our approach to tackling this problem?
>
> I think communicating and enforcing a clear policy on how stale PRs are
> handled might be a good way to reduce the number of stale PRs we have
> without making contributors feel rejected.
>
> I don't know what such a policy would look like, but it should be
> enforceable and "lightweight"--i.e. it shouldn't feel like a hammer used to
> reject people's work, but rather a necessary tool to keep the project's
> contributions relevant and manageable.
>
> Nick
>

Re: Handling stale PRs

Posted by Matei Zaharia <ma...@gmail.com>.
Hey Nicholas,

In general we've been looking at these periodically (at least I have) and asking people to close out of date ones, but it's true that the list has gotten fairly large. We should probably have an expiry time of a few months and close them automatically. I agree that it's daunting to see so many open PRs.

Matei

On August 25, 2014 at 9:03:09 PM, Nicholas Chammas (nicholas.chammas@gmail.com) wrote:

Check this out: 
https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc 

We're hitting close to 300 open PRs. Those are the least recently updated 
ones. 

I think having a low number of stale (i.e. not recently updated) PRs is a 
good thing to shoot for. It doesn't leave contributors hanging (which feels 
bad for contributors), and reduces project clutter (which feels bad for 
maintainers/committers). 

What is our approach to tackling this problem? 

I think communicating and enforcing a clear policy on how stale PRs are 
handled might be a good way to reduce the number of stale PRs we have 
without making contributors feel rejected. 

I don't know what such a policy would look like, but it should be 
enforceable and "lightweight"--i.e. it shouldn't feel like a hammer used to 
reject people's work, but rather a necessary tool to keep the project's 
contributions relevant and manageable. 

Nick