You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2021/06/29 08:28:07 UTC

Improving PR workload management for Arrow maintainers

hi folks,

I've noted that the volume of PRs for Arrow has been steadily
increasing (and will likely continue to increase), and while I've
personally had less time for development / maintenance / code reviews
over the last year, I would like to have a discussion about what we
could do to improve our tooling for maintainers to optimize the
efficiency of time spent tending to the PR queue. In my own
experience, I have felt that I have wasted a lot of time digging
around the queue looking for PRs that are awaiting feedback or need to
be merged.

I note first of all that around 70 out of 173 open PRs have been
updated in the last 7 days, so while there is some PR staleness, to
have nearly half of the PRs active is pretty good. That said, ~70
active PRs is a lot of PRs to tend to.

I scraped the project's code review comment history, and here are the
individuals who have left the most comments on PRs since genesis

pitrou                6802
wesm                  5023
emkornfield           3032
bkietz                2834
kou                   1489
nealrichardson        1439
fsaintjacques         1356
kszucs                1250
alamb                 1133
jorisvandenbossche    1094
liyafan82              831
lidavidm               816
westonpace             794
xhochy                 770
nevi-me                643
BryanCutler            639
jorgecarleitao         635
cpcloud                551
sunchao                536
ianmcook               499

Since we're probably stuck using GitHub to receive code contributions
(as opposed to systems — Gerrit is one I'm familiar with — that
provide more structure for reviewers to track the patches they "own"
as well as the outgoing/incoming state of reviews), I am wondering
what kinds of tools we could create to make it easier for maintainers
to keep track of PRs they are shepherding through the contribution
process. Ideally this wouldn't involve maintainers having to engage in
some explicit action like assigning themselves as a PR reviewer.

Here's one idea: a web application that displays "your reviews", a
table of PRs that you have interacted with in any way (commented, left
code review, assigned as reviewer, someone mentioned you, etc.) sorted
either by last commit or last comment to assess "freshness". So if you
comment on a PR or leave a code review, it will automatically show up
in "your reviews". It could also indicate whether there has been
activity on the PR since the last time you interacted with it.

Having now used the GitHub API to pull comments from PRs for the above
analysis, there is certainly enough information available to help
create this kind of tool. I'd be willing to contribute to building the
backend of such a web application.

This is just one idea, but I am curious to hear from others who are
spending a lot of time doing code review / PR merging to see what
might help them use their time more effectively.

Thanks,
Wes

Re: Improving PR workload management for Arrow maintainers

Posted by Brian Hulette <bh...@apache.org>.
I review a decent number of PRs for Apache Beam, and I've built some of my
own tooling to help keep track of open PRs. I wrote a script that pulls
metadata about all relevant PRs and uses some heuristics to categorize them
into:
- incoming review
- outgoing review
- "CC'd" - where I've been mentioned but am not the reviewer or author

In the first two cases I try to highlight the ones that need my
attention, simply by detecting if I'm the person who took the most recent
action or not. This works reasonably well but gets tripped up on several
edge cases:
1) The author might push multiple commits before they're actually ready for
more feedback.
2) A PR might need feedback from multiple reviewers (e.g. people with
domain knowledge of certain areas).

I've been planning to make my script stateful so that I can mark a PR as
"not my turn" (i.e. unhighlight this until there is more activity), and
maybe "never my turn" (i.e. I've finished reviewing this, it's waiting on
someone else), to handle these cases.

The idea of a "Addressing Feedback" -> "Waiting on Review" label that is
automatically transitioned when there is activity would run into these same
edge cases.
If a reviewer had the ability to bump the label back to "Addressing
Feedback", that would at least address #1.

I think Wes's proposal (a read-only web UI) would likely also run into
these edge cases since it stores no state of its own to deconflict in those
situations.

Brian

On Tue, Jun 29, 2021 at 6:26 AM Wes McKinney <we...@gmail.com> wrote:

> On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb <al...@influxdata.com> wrote:
> >
> > The thing that would make me more efficient reviewing PRs is figuring out
> > which one of the open reviews are ready for additional feedback.
>
> Yes, I think this would be the single most significant quality-of-life
> improvement for reviewers.
>
> > I think the idea of a webapp or something that shows active reviews would
> > be helpful (though I get most of that from appropriate email filters).
> >
> > What about a system involving labels (for which there is already a basic
> > GUI in github)? Something low tech like
> >
> > (Waiting for Review)
> > (Addressing Feedback)
> > (Approved, waiting for Merge)
> >
> > With maybe some automation prompting people to add the "Waiting on
> Review"
> > label when they want feedback
>
> I think it would have to be a bot that automatically sets the labels.
> If it requires contributors to take some action outside of pushing new
> work (new commits or a rebased version of the patch) to the PR and
> leaving responses to comments on the PR, the system is likely to fail
> some non-trivial percentage of the time.


> Given the quality of off-the-shelf web app components nowadays (e.g.
> https://material-ui.com), throwing together a read-only PR dashboard
> that shows what has changed since you last interacted with them (along
> with some other helpful things, like whether the build is passing) is
> "probably" not a super heavy lift. I haven't done any frontend
> development in years so while the backend part (writing Python code to
> wrangle data from GitHub's REST API and put it in a SQLite database)
> wouldn't take very long I would need some help on the front end
> portion and setting it up for deployment on DigitalOcean or somewhere.
>
> > Andrew
> >
> > On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > hi folks,
> > >
> > > I've noted that the volume of PRs for Arrow has been steadily
> > > increasing (and will likely continue to increase), and while I've
> > > personally had less time for development / maintenance / code reviews
> > > over the last year, I would like to have a discussion about what we
> > > could do to improve our tooling for maintainers to optimize the
> > > efficiency of time spent tending to the PR queue. In my own
> > > experience, I have felt that I have wasted a lot of time digging
> > > around the queue looking for PRs that are awaiting feedback or need to
> > > be merged.
> > >
> > > I note first of all that around 70 out of 173 open PRs have been
> > > updated in the last 7 days, so while there is some PR staleness, to
> > > have nearly half of the PRs active is pretty good. That said, ~70
> > > active PRs is a lot of PRs to tend to.
> > >
> > > I scraped the project's code review comment history, and here are the
> > > individuals who have left the most comments on PRs since genesis
> > >
> > > pitrou                6802
> > > wesm                  5023
> > > emkornfield           3032
> > > bkietz                2834
> > > kou                   1489
> > > nealrichardson        1439
> > > fsaintjacques         1356
> > > kszucs                1250
> > > alamb                 1133
> > > jorisvandenbossche    1094
> > > liyafan82              831
> > > lidavidm               816
> > > westonpace             794
> > > xhochy                 770
> > > nevi-me                643
> > > BryanCutler            639
> > > jorgecarleitao         635
> > > cpcloud                551
> > > sunchao                536
> > > ianmcook               499
> > >
> > > Since we're probably stuck using GitHub to receive code contributions
> > > (as opposed to systems — Gerrit is one I'm familiar with — that
> > > provide more structure for reviewers to track the patches they "own"
> > > as well as the outgoing/incoming state of reviews), I am wondering
> > > what kinds of tools we could create to make it easier for maintainers
> > > to keep track of PRs they are shepherding through the contribution
> > > process. Ideally this wouldn't involve maintainers having to engage in
> > > some explicit action like assigning themselves as a PR reviewer.
> > >
> > > Here's one idea: a web application that displays "your reviews", a
> > > table of PRs that you have interacted with in any way (commented, left
> > > code review, assigned as reviewer, someone mentioned you, etc.) sorted
> > > either by last commit or last comment to assess "freshness". So if you
> > > comment on a PR or leave a code review, it will automatically show up
> > > in "your reviews". It could also indicate whether there has been
> > > activity on the PR since the last time you interacted with it.
> > >
> > > Having now used the GitHub API to pull comments from PRs for the above
> > > analysis, there is certainly enough information available to help
> > > create this kind of tool. I'd be willing to contribute to building the
> > > backend of such a web application.
> > >
> > > This is just one idea, but I am curious to hear from others who are
> > > spending a lot of time doing code review / PR merging to see what
> > > might help them use their time more effectively.
> > >
> > > Thanks,
> > > Wes
> > >
>

Re: Improving PR workload management for Arrow maintainers

Posted by Weston Pace <we...@gmail.com>.
I investigated the cpython approach and the PR labelling is a part of
the existing bedevere bot which does a number of things (not all
relevant to Arrow).  Yesterday I created a standalone Github action[1]
dedicated to this task roughly based on my previous email.  It will
apply "awaiting-review" and "awaiting-changes" labels when
appropriate.  I think it's probably ready to try out at this point
(I'm sure there will be some hiccups).  If any repo wants to volunteer
to be a guinea pig I will work with you and get the action configured
and running.  I have it enabled on a dummy repository here[2] and this
is what it looks like in action[3].

[1] https://github.com/westonpace/pr-needs-review/
[2] https://github.com/westonpace/pr-needs-review-dummy-2/blob/main/.github/workflows/label-pr.yml
[3] https://github.com/westonpace/pr-needs-review-dummy-2/pull/13

On Thu, Jul 1, 2021 at 11:36 AM Adam Lippai <ad...@rigo.sk> wrote:
>
> Not sure if it's applicable, but GitHub is improving:
> https://github.blog/changelog/2021-06-23-whats-new-with-github-issues/
>
> That spreadsheet-like issue tracking looks concise.
>
> Best regards,
> Adam Lippai
>
> On Wed, Jun 30, 2021, 10:28 Antoine Pitrou <an...@python.org> wrote:
>
> >
> > Le 30/06/2021 à 10:04, Wes McKinney a écrit :
> > >
> > > I guess my concern with this is how to quickly separate out "PRs I am
> > > keeping an eye on". If there are 100 active PRs and only 20 of them
> > > are ones you've interacted with, how do you know which ones need your
> > > attention? GitHub does have the "reviewed-by" filter which could be
> > > good enough
> >
> > There's also the "involves" filter that can also select PRs you have
> > commented on without giving a formal review.
> >
> > However, those filters don't let you know which PRs are pending review
> > if you haven't already commented on them.
> >
> > Regards
> >
> > Antoine.
> >

Re: Improving PR workload management for Arrow maintainers

Posted by Adam Lippai <ad...@rigo.sk>.
Not sure if it's applicable, but GitHub is improving:
https://github.blog/changelog/2021-06-23-whats-new-with-github-issues/

That spreadsheet-like issue tracking looks concise.

Best regards,
Adam Lippai

On Wed, Jun 30, 2021, 10:28 Antoine Pitrou <an...@python.org> wrote:

>
> Le 30/06/2021 à 10:04, Wes McKinney a écrit :
> >
> > I guess my concern with this is how to quickly separate out "PRs I am
> > keeping an eye on". If there are 100 active PRs and only 20 of them
> > are ones you've interacted with, how do you know which ones need your
> > attention? GitHub does have the "reviewed-by" filter which could be
> > good enough
>
> There's also the "involves" filter that can also select PRs you have
> commented on without giving a formal review.
>
> However, those filters don't let you know which PRs are pending review
> if you haven't already commented on them.
>
> Regards
>
> Antoine.
>

Re: Improving PR workload management for Arrow maintainers

Posted by Antoine Pitrou <an...@python.org>.
Le 30/06/2021 à 10:04, Wes McKinney a écrit :
> 
> I guess my concern with this is how to quickly separate out "PRs I am
> keeping an eye on". If there are 100 active PRs and only 20 of them
> are ones you've interacted with, how do you know which ones need your
> attention? GitHub does have the "reviewed-by" filter which could be
> good enough

There's also the "involves" filter that can also select PRs you have 
commented on without giving a formal review.

However, those filters don't let you know which PRs are pending review 
if you haven't already commented on them.

Regards

Antoine.

Re: Improving PR workload management for Arrow maintainers

Posted by Wes McKinney <we...@gmail.com>.
On Tue, Jun 29, 2021 at 8:05 PM Weston Pace <we...@gmail.com> wrote:
>
> I apologize.  I did plan on working on this but it's taken a back seat
> for a while.  I would still recommend shying away from a standalone
> UI.  You will end up making a lot of requests (and possibly running
> into Github throttles) if you want detailed PR information for all of
> the PRs.  To work around those limitations the Spark example that I
> looked at kept a standalone database and polled Github on a regular
> basis.  This works but then you have quite a bit of complexity (it's
> no longer a simple static web page you can just host somewhere, you'll
> need to pay for a backend server and also the cost of maintaining that
> server).  Also, you may find yourself continuously playing catchup to
> add features that exist in Github or face users migrating away from
> the custom tool.

On this I will say:

* The rate limit for GitHub API calls is 5000 per hour per user, so if
you polled PRs once every 5 minutes, you could keep 200 or so PRs up
to date that way (assuming ~2 GitHub API calls per PR), and more if we
relied on a rotation of bot API tokens
* GitHub's REST API features are relatively slow-moving
* A small DigitalOcean server that would be adequate for this would
cost less than $100/month

> The approach I was pursuing was a single Github action repository to
> add labels similar to those described by Andrew Lamb.  You could make
> it quite complex but I think a simple state machine would be:
>
> New PR Created (not in draft) -> Add "Needs review" label
> PR moved into draft -> Remove "Needs review" and "changes requested" labels.
> PR Review added with state "Changes Needed" -> Remove "Needs review",
> add "changes requested", add comment explaining how to report changes
> have been made
> Comment made with "I have completed all requested changes" -> Remove
> "changes requested", add "needs review", re-request all reviewers
> Nightly cron job -> Any PR that has had the "needs review" label for X
> days gets "needs attention" label
> Nightly cron job -> Any PR that has had the "changes requested" label
> for Y days gets "stale" label, add comment explaining why that
> happened and encouraging the user to state if they want someone else
> to take over the PR.

I guess my concern with this is how to quickly separate out "PRs I am
keeping an eye on". If there are 100 active PRs and only 20 of them
are ones you've interacted with, how do you know which ones need your
attention? GitHub does have the "reviewed-by" filter which could be
good enough

https://github.com/apache/arrow/pulls?q=is%3Apr+is%3Aclosed+reviewed-by%3A%40me

One potential benefit of the web app approach would be for reviewers
to be able to "watch" reviews that they want to show up in their "my
reviews" page even if they have yet to actually comment or review.

> I investigated automatic adding/removing of labels based on passing /
> failing checks but the checks in Arrow are not stable enough I think
> and getting that information out of Github is rather tricky.
>
> I don't know that I'll have time to work on this at the moment but I
> think it'd be pretty straightforward to build such an action if anyone
> is interested.  Also, it sounds like cython has something similar.  If
> it is simple enough we could jsut steal it.
>
> On Tue, Jun 29, 2021 at 6:00 AM Antoine Pitrou <an...@python.org> wrote:
> >
> >
> > Le 29/06/2021 à 15:25, Wes McKinney a écrit :
> > > On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb <al...@influxdata.com> wrote:
> > >>
> > >> The thing that would make me more efficient reviewing PRs is figuring out
> > >> which one of the open reviews are ready for additional feedback.
> > >
> > > Yes, I think this would be the single most significant quality-of-life
> > > improvement for reviewers.
> >
> > Agreed as well.
> >
> > The CPython project uses dedicated labels for that (some automatically
> > set/unset) as well as a bot that pesters contributors to mention when
> > their PR is ready for review again.  It helps assert that the labelled
> > PR status reflects their actual status accurately.
> >
> > See some examples here:
> > https://github.com/python/cpython/pull/26941#issuecomment-870643346
> > https://github.com/python/cpython/pull/26772#issuecomment-866020819
> > https://github.com/python/cpython/pull/26677#pullrequestreview-682724234
> >
> > Regards
> >
> > Antoine.
> >
> >
> > >
> > >> I think the idea of a webapp or something that shows active reviews would
> > >> be helpful (though I get most of that from appropriate email filters).
> > >>
> > >> What about a system involving labels (for which there is already a basic
> > >> GUI in github)? Something low tech like
> > >>
> > >> (Waiting for Review)
> > >> (Addressing Feedback)
> > >> (Approved, waiting for Merge)
> > >>
> > >> With maybe some automation prompting people to add the "Waiting on Review"
> > >> label when they want feedback
> > >
> > > I think it would have to be a bot that automatically sets the labels.
> > > If it requires contributors to take some action outside of pushing new
> > > work (new commits or a rebased version of the patch) to the PR and
> > > leaving responses to comments on the PR, the system is likely to fail
> > > some non-trivial percentage of the time.
> > >
> > > Given the quality of off-the-shelf web app components nowadays (e.g.
> > > https://material-ui.com), throwing together a read-only PR dashboard
> > > that shows what has changed since you last interacted with them (along
> > > with some other helpful things, like whether the build is passing) is
> > > "probably" not a super heavy lift. I haven't done any frontend
> > > development in years so while the backend part (writing Python code to
> > > wrangle data from GitHub's REST API and put it in a SQLite database)
> > > wouldn't take very long I would need some help on the front end
> > > portion and setting it up for deployment on DigitalOcean or somewhere.
> > >
> > >> Andrew
> > >>
> > >> On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney <we...@gmail.com> wrote:
> > >>
> > >>> hi folks,
> > >>>
> > >>> I've noted that the volume of PRs for Arrow has been steadily
> > >>> increasing (and will likely continue to increase), and while I've
> > >>> personally had less time for development / maintenance / code reviews
> > >>> over the last year, I would like to have a discussion about what we
> > >>> could do to improve our tooling for maintainers to optimize the
> > >>> efficiency of time spent tending to the PR queue. In my own
> > >>> experience, I have felt that I have wasted a lot of time digging
> > >>> around the queue looking for PRs that are awaiting feedback or need to
> > >>> be merged.
> > >>>
> > >>> I note first of all that around 70 out of 173 open PRs have been
> > >>> updated in the last 7 days, so while there is some PR staleness, to
> > >>> have nearly half of the PRs active is pretty good. That said, ~70
> > >>> active PRs is a lot of PRs to tend to.
> > >>>
> > >>> I scraped the project's code review comment history, and here are the
> > >>> individuals who have left the most comments on PRs since genesis
> > >>>
> > >>> pitrou                6802
> > >>> wesm                  5023
> > >>> emkornfield           3032
> > >>> bkietz                2834
> > >>> kou                   1489
> > >>> nealrichardson        1439
> > >>> fsaintjacques         1356
> > >>> kszucs                1250
> > >>> alamb                 1133
> > >>> jorisvandenbossche    1094
> > >>> liyafan82              831
> > >>> lidavidm               816
> > >>> westonpace             794
> > >>> xhochy                 770
> > >>> nevi-me                643
> > >>> BryanCutler            639
> > >>> jorgecarleitao         635
> > >>> cpcloud                551
> > >>> sunchao                536
> > >>> ianmcook               499
> > >>>
> > >>> Since we're probably stuck using GitHub to receive code contributions
> > >>> (as opposed to systems — Gerrit is one I'm familiar with — that
> > >>> provide more structure for reviewers to track the patches they "own"
> > >>> as well as the outgoing/incoming state of reviews), I am wondering
> > >>> what kinds of tools we could create to make it easier for maintainers
> > >>> to keep track of PRs they are shepherding through the contribution
> > >>> process. Ideally this wouldn't involve maintainers having to engage in
> > >>> some explicit action like assigning themselves as a PR reviewer.
> > >>>
> > >>> Here's one idea: a web application that displays "your reviews", a
> > >>> table of PRs that you have interacted with in any way (commented, left
> > >>> code review, assigned as reviewer, someone mentioned you, etc.) sorted
> > >>> either by last commit or last comment to assess "freshness". So if you
> > >>> comment on a PR or leave a code review, it will automatically show up
> > >>> in "your reviews". It could also indicate whether there has been
> > >>> activity on the PR since the last time you interacted with it.
> > >>>
> > >>> Having now used the GitHub API to pull comments from PRs for the above
> > >>> analysis, there is certainly enough information available to help
> > >>> create this kind of tool. I'd be willing to contribute to building the
> > >>> backend of such a web application.
> > >>>
> > >>> This is just one idea, but I am curious to hear from others who are
> > >>> spending a lot of time doing code review / PR merging to see what
> > >>> might help them use their time more effectively.
> > >>>
> > >>> Thanks,
> > >>> Wes
> > >>>

Re: Improving PR workload management for Arrow maintainers

Posted by Weston Pace <we...@gmail.com>.
I apologize.  I did plan on working on this but it's taken a back seat
for a while.  I would still recommend shying away from a standalone
UI.  You will end up making a lot of requests (and possibly running
into Github throttles) if you want detailed PR information for all of
the PRs.  To work around those limitations the Spark example that I
looked at kept a standalone database and polled Github on a regular
basis.  This works but then you have quite a bit of complexity (it's
no longer a simple static web page you can just host somewhere, you'll
need to pay for a backend server and also the cost of maintaining that
server).  Also, you may find yourself continuously playing catchup to
add features that exist in Github or face users migrating away from
the custom tool.

The approach I was pursuing was a single Github action repository to
add labels similar to those described by Andrew Lamb.  You could make
it quite complex but I think a simple state machine would be:

New PR Created (not in draft) -> Add "Needs review" label
PR moved into draft -> Remove "Needs review" and "changes requested" labels.
PR Review added with state "Changes Needed" -> Remove "Needs review",
add "changes requested", add comment explaining how to report changes
have been made
Comment made with "I have completed all requested changes" -> Remove
"changes requested", add "needs review", re-request all reviewers
Nightly cron job -> Any PR that has had the "needs review" label for X
days gets "needs attention" label
Nightly cron job -> Any PR that has had the "changes requested" label
for Y days gets "stale" label, add comment explaining why that
happened and encouraging the user to state if they want someone else
to take over the PR.

I investigated automatic adding/removing of labels based on passing /
failing checks but the checks in Arrow are not stable enough I think
and getting that information out of Github is rather tricky.

I don't know that I'll have time to work on this at the moment but I
think it'd be pretty straightforward to build such an action if anyone
is interested.  Also, it sounds like cython has something similar.  If
it is simple enough we could jsut steal it.

On Tue, Jun 29, 2021 at 6:00 AM Antoine Pitrou <an...@python.org> wrote:
>
>
> Le 29/06/2021 à 15:25, Wes McKinney a écrit :
> > On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb <al...@influxdata.com> wrote:
> >>
> >> The thing that would make me more efficient reviewing PRs is figuring out
> >> which one of the open reviews are ready for additional feedback.
> >
> > Yes, I think this would be the single most significant quality-of-life
> > improvement for reviewers.
>
> Agreed as well.
>
> The CPython project uses dedicated labels for that (some automatically
> set/unset) as well as a bot that pesters contributors to mention when
> their PR is ready for review again.  It helps assert that the labelled
> PR status reflects their actual status accurately.
>
> See some examples here:
> https://github.com/python/cpython/pull/26941#issuecomment-870643346
> https://github.com/python/cpython/pull/26772#issuecomment-866020819
> https://github.com/python/cpython/pull/26677#pullrequestreview-682724234
>
> Regards
>
> Antoine.
>
>
> >
> >> I think the idea of a webapp or something that shows active reviews would
> >> be helpful (though I get most of that from appropriate email filters).
> >>
> >> What about a system involving labels (for which there is already a basic
> >> GUI in github)? Something low tech like
> >>
> >> (Waiting for Review)
> >> (Addressing Feedback)
> >> (Approved, waiting for Merge)
> >>
> >> With maybe some automation prompting people to add the "Waiting on Review"
> >> label when they want feedback
> >
> > I think it would have to be a bot that automatically sets the labels.
> > If it requires contributors to take some action outside of pushing new
> > work (new commits or a rebased version of the patch) to the PR and
> > leaving responses to comments on the PR, the system is likely to fail
> > some non-trivial percentage of the time.
> >
> > Given the quality of off-the-shelf web app components nowadays (e.g.
> > https://material-ui.com), throwing together a read-only PR dashboard
> > that shows what has changed since you last interacted with them (along
> > with some other helpful things, like whether the build is passing) is
> > "probably" not a super heavy lift. I haven't done any frontend
> > development in years so while the backend part (writing Python code to
> > wrangle data from GitHub's REST API and put it in a SQLite database)
> > wouldn't take very long I would need some help on the front end
> > portion and setting it up for deployment on DigitalOcean or somewhere.
> >
> >> Andrew
> >>
> >> On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney <we...@gmail.com> wrote:
> >>
> >>> hi folks,
> >>>
> >>> I've noted that the volume of PRs for Arrow has been steadily
> >>> increasing (and will likely continue to increase), and while I've
> >>> personally had less time for development / maintenance / code reviews
> >>> over the last year, I would like to have a discussion about what we
> >>> could do to improve our tooling for maintainers to optimize the
> >>> efficiency of time spent tending to the PR queue. In my own
> >>> experience, I have felt that I have wasted a lot of time digging
> >>> around the queue looking for PRs that are awaiting feedback or need to
> >>> be merged.
> >>>
> >>> I note first of all that around 70 out of 173 open PRs have been
> >>> updated in the last 7 days, so while there is some PR staleness, to
> >>> have nearly half of the PRs active is pretty good. That said, ~70
> >>> active PRs is a lot of PRs to tend to.
> >>>
> >>> I scraped the project's code review comment history, and here are the
> >>> individuals who have left the most comments on PRs since genesis
> >>>
> >>> pitrou                6802
> >>> wesm                  5023
> >>> emkornfield           3032
> >>> bkietz                2834
> >>> kou                   1489
> >>> nealrichardson        1439
> >>> fsaintjacques         1356
> >>> kszucs                1250
> >>> alamb                 1133
> >>> jorisvandenbossche    1094
> >>> liyafan82              831
> >>> lidavidm               816
> >>> westonpace             794
> >>> xhochy                 770
> >>> nevi-me                643
> >>> BryanCutler            639
> >>> jorgecarleitao         635
> >>> cpcloud                551
> >>> sunchao                536
> >>> ianmcook               499
> >>>
> >>> Since we're probably stuck using GitHub to receive code contributions
> >>> (as opposed to systems — Gerrit is one I'm familiar with — that
> >>> provide more structure for reviewers to track the patches they "own"
> >>> as well as the outgoing/incoming state of reviews), I am wondering
> >>> what kinds of tools we could create to make it easier for maintainers
> >>> to keep track of PRs they are shepherding through the contribution
> >>> process. Ideally this wouldn't involve maintainers having to engage in
> >>> some explicit action like assigning themselves as a PR reviewer.
> >>>
> >>> Here's one idea: a web application that displays "your reviews", a
> >>> table of PRs that you have interacted with in any way (commented, left
> >>> code review, assigned as reviewer, someone mentioned you, etc.) sorted
> >>> either by last commit or last comment to assess "freshness". So if you
> >>> comment on a PR or leave a code review, it will automatically show up
> >>> in "your reviews". It could also indicate whether there has been
> >>> activity on the PR since the last time you interacted with it.
> >>>
> >>> Having now used the GitHub API to pull comments from PRs for the above
> >>> analysis, there is certainly enough information available to help
> >>> create this kind of tool. I'd be willing to contribute to building the
> >>> backend of such a web application.
> >>>
> >>> This is just one idea, but I am curious to hear from others who are
> >>> spending a lot of time doing code review / PR merging to see what
> >>> might help them use their time more effectively.
> >>>
> >>> Thanks,
> >>> Wes
> >>>

Re: Improving PR workload management for Arrow maintainers

Posted by Antoine Pitrou <an...@python.org>.
Le 29/06/2021 à 15:25, Wes McKinney a écrit :
> On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb <al...@influxdata.com> wrote:
>>
>> The thing that would make me more efficient reviewing PRs is figuring out
>> which one of the open reviews are ready for additional feedback.
> 
> Yes, I think this would be the single most significant quality-of-life
> improvement for reviewers.

Agreed as well.

The CPython project uses dedicated labels for that (some automatically 
set/unset) as well as a bot that pesters contributors to mention when 
their PR is ready for review again.  It helps assert that the labelled 
PR status reflects their actual status accurately.

See some examples here:
https://github.com/python/cpython/pull/26941#issuecomment-870643346
https://github.com/python/cpython/pull/26772#issuecomment-866020819
https://github.com/python/cpython/pull/26677#pullrequestreview-682724234

Regards

Antoine.


> 
>> I think the idea of a webapp or something that shows active reviews would
>> be helpful (though I get most of that from appropriate email filters).
>>
>> What about a system involving labels (for which there is already a basic
>> GUI in github)? Something low tech like
>>
>> (Waiting for Review)
>> (Addressing Feedback)
>> (Approved, waiting for Merge)
>>
>> With maybe some automation prompting people to add the "Waiting on Review"
>> label when they want feedback
> 
> I think it would have to be a bot that automatically sets the labels.
> If it requires contributors to take some action outside of pushing new
> work (new commits or a rebased version of the patch) to the PR and
> leaving responses to comments on the PR, the system is likely to fail
> some non-trivial percentage of the time.
> 
> Given the quality of off-the-shelf web app components nowadays (e.g.
> https://material-ui.com), throwing together a read-only PR dashboard
> that shows what has changed since you last interacted with them (along
> with some other helpful things, like whether the build is passing) is
> "probably" not a super heavy lift. I haven't done any frontend
> development in years so while the backend part (writing Python code to
> wrangle data from GitHub's REST API and put it in a SQLite database)
> wouldn't take very long I would need some help on the front end
> portion and setting it up for deployment on DigitalOcean or somewhere.
> 
>> Andrew
>>
>> On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney <we...@gmail.com> wrote:
>>
>>> hi folks,
>>>
>>> I've noted that the volume of PRs for Arrow has been steadily
>>> increasing (and will likely continue to increase), and while I've
>>> personally had less time for development / maintenance / code reviews
>>> over the last year, I would like to have a discussion about what we
>>> could do to improve our tooling for maintainers to optimize the
>>> efficiency of time spent tending to the PR queue. In my own
>>> experience, I have felt that I have wasted a lot of time digging
>>> around the queue looking for PRs that are awaiting feedback or need to
>>> be merged.
>>>
>>> I note first of all that around 70 out of 173 open PRs have been
>>> updated in the last 7 days, so while there is some PR staleness, to
>>> have nearly half of the PRs active is pretty good. That said, ~70
>>> active PRs is a lot of PRs to tend to.
>>>
>>> I scraped the project's code review comment history, and here are the
>>> individuals who have left the most comments on PRs since genesis
>>>
>>> pitrou                6802
>>> wesm                  5023
>>> emkornfield           3032
>>> bkietz                2834
>>> kou                   1489
>>> nealrichardson        1439
>>> fsaintjacques         1356
>>> kszucs                1250
>>> alamb                 1133
>>> jorisvandenbossche    1094
>>> liyafan82              831
>>> lidavidm               816
>>> westonpace             794
>>> xhochy                 770
>>> nevi-me                643
>>> BryanCutler            639
>>> jorgecarleitao         635
>>> cpcloud                551
>>> sunchao                536
>>> ianmcook               499
>>>
>>> Since we're probably stuck using GitHub to receive code contributions
>>> (as opposed to systems — Gerrit is one I'm familiar with — that
>>> provide more structure for reviewers to track the patches they "own"
>>> as well as the outgoing/incoming state of reviews), I am wondering
>>> what kinds of tools we could create to make it easier for maintainers
>>> to keep track of PRs they are shepherding through the contribution
>>> process. Ideally this wouldn't involve maintainers having to engage in
>>> some explicit action like assigning themselves as a PR reviewer.
>>>
>>> Here's one idea: a web application that displays "your reviews", a
>>> table of PRs that you have interacted with in any way (commented, left
>>> code review, assigned as reviewer, someone mentioned you, etc.) sorted
>>> either by last commit or last comment to assess "freshness". So if you
>>> comment on a PR or leave a code review, it will automatically show up
>>> in "your reviews". It could also indicate whether there has been
>>> activity on the PR since the last time you interacted with it.
>>>
>>> Having now used the GitHub API to pull comments from PRs for the above
>>> analysis, there is certainly enough information available to help
>>> create this kind of tool. I'd be willing to contribute to building the
>>> backend of such a web application.
>>>
>>> This is just one idea, but I am curious to hear from others who are
>>> spending a lot of time doing code review / PR merging to see what
>>> might help them use their time more effectively.
>>>
>>> Thanks,
>>> Wes
>>>

Re: Improving PR workload management for Arrow maintainers

Posted by Wes McKinney <we...@gmail.com>.
On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb <al...@influxdata.com> wrote:
>
> The thing that would make me more efficient reviewing PRs is figuring out
> which one of the open reviews are ready for additional feedback.

Yes, I think this would be the single most significant quality-of-life
improvement for reviewers.

> I think the idea of a webapp or something that shows active reviews would
> be helpful (though I get most of that from appropriate email filters).
>
> What about a system involving labels (for which there is already a basic
> GUI in github)? Something low tech like
>
> (Waiting for Review)
> (Addressing Feedback)
> (Approved, waiting for Merge)
>
> With maybe some automation prompting people to add the "Waiting on Review"
> label when they want feedback

I think it would have to be a bot that automatically sets the labels.
If it requires contributors to take some action outside of pushing new
work (new commits or a rebased version of the patch) to the PR and
leaving responses to comments on the PR, the system is likely to fail
some non-trivial percentage of the time.

Given the quality of off-the-shelf web app components nowadays (e.g.
https://material-ui.com), throwing together a read-only PR dashboard
that shows what has changed since you last interacted with them (along
with some other helpful things, like whether the build is passing) is
"probably" not a super heavy lift. I haven't done any frontend
development in years so while the backend part (writing Python code to
wrangle data from GitHub's REST API and put it in a SQLite database)
wouldn't take very long I would need some help on the front end
portion and setting it up for deployment on DigitalOcean or somewhere.

> Andrew
>
> On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney <we...@gmail.com> wrote:
>
> > hi folks,
> >
> > I've noted that the volume of PRs for Arrow has been steadily
> > increasing (and will likely continue to increase), and while I've
> > personally had less time for development / maintenance / code reviews
> > over the last year, I would like to have a discussion about what we
> > could do to improve our tooling for maintainers to optimize the
> > efficiency of time spent tending to the PR queue. In my own
> > experience, I have felt that I have wasted a lot of time digging
> > around the queue looking for PRs that are awaiting feedback or need to
> > be merged.
> >
> > I note first of all that around 70 out of 173 open PRs have been
> > updated in the last 7 days, so while there is some PR staleness, to
> > have nearly half of the PRs active is pretty good. That said, ~70
> > active PRs is a lot of PRs to tend to.
> >
> > I scraped the project's code review comment history, and here are the
> > individuals who have left the most comments on PRs since genesis
> >
> > pitrou                6802
> > wesm                  5023
> > emkornfield           3032
> > bkietz                2834
> > kou                   1489
> > nealrichardson        1439
> > fsaintjacques         1356
> > kszucs                1250
> > alamb                 1133
> > jorisvandenbossche    1094
> > liyafan82              831
> > lidavidm               816
> > westonpace             794
> > xhochy                 770
> > nevi-me                643
> > BryanCutler            639
> > jorgecarleitao         635
> > cpcloud                551
> > sunchao                536
> > ianmcook               499
> >
> > Since we're probably stuck using GitHub to receive code contributions
> > (as opposed to systems — Gerrit is one I'm familiar with — that
> > provide more structure for reviewers to track the patches they "own"
> > as well as the outgoing/incoming state of reviews), I am wondering
> > what kinds of tools we could create to make it easier for maintainers
> > to keep track of PRs they are shepherding through the contribution
> > process. Ideally this wouldn't involve maintainers having to engage in
> > some explicit action like assigning themselves as a PR reviewer.
> >
> > Here's one idea: a web application that displays "your reviews", a
> > table of PRs that you have interacted with in any way (commented, left
> > code review, assigned as reviewer, someone mentioned you, etc.) sorted
> > either by last commit or last comment to assess "freshness". So if you
> > comment on a PR or leave a code review, it will automatically show up
> > in "your reviews". It could also indicate whether there has been
> > activity on the PR since the last time you interacted with it.
> >
> > Having now used the GitHub API to pull comments from PRs for the above
> > analysis, there is certainly enough information available to help
> > create this kind of tool. I'd be willing to contribute to building the
> > backend of such a web application.
> >
> > This is just one idea, but I am curious to hear from others who are
> > spending a lot of time doing code review / PR merging to see what
> > might help them use their time more effectively.
> >
> > Thanks,
> > Wes
> >

Re: Improving PR workload management for Arrow maintainers

Posted by Jorge Cardoso Leitão <jo...@gmail.com>.
I just had a quick chat over the ASF's slack with Daniel Gruno from the
infra team and they are rolling out the "triage role" [1] for
non-committers, which AFAIK offers useful tools in this context:

* add/remove labels
* assign reviewees
* mark duplicates
* close, open and assign to issues and PRs

One does not disregard the other, just though it could be useful
information to this topic, as maybe this cover some ground?

Best,
Jorge

[1]
https://docs.github.com/en/organizations/managing-access-to-your-organizations-repositories/repository-permission-levels-for-an-organization


On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb <al...@influxdata.com> wrote:

> The thing that would make me more efficient reviewing PRs is figuring out
> which one of the open reviews are ready for additional feedback.
>
> I think the idea of a webapp or something that shows active reviews would
> be helpful (though I get most of that from appropriate email filters).
>
> What about a system involving labels (for which there is already a basic
> GUI in github)? Something low tech like
>
> (Waiting for Review)
> (Addressing Feedback)
> (Approved, waiting for Merge)
>
> With maybe some automation prompting people to add the "Waiting on Review"
> label when they want feedback
>
> Andrew
>
> On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney <we...@gmail.com> wrote:
>
> > hi folks,
> >
> > I've noted that the volume of PRs for Arrow has been steadily
> > increasing (and will likely continue to increase), and while I've
> > personally had less time for development / maintenance / code reviews
> > over the last year, I would like to have a discussion about what we
> > could do to improve our tooling for maintainers to optimize the
> > efficiency of time spent tending to the PR queue. In my own
> > experience, I have felt that I have wasted a lot of time digging
> > around the queue looking for PRs that are awaiting feedback or need to
> > be merged.
> >
> > I note first of all that around 70 out of 173 open PRs have been
> > updated in the last 7 days, so while there is some PR staleness, to
> > have nearly half of the PRs active is pretty good. That said, ~70
> > active PRs is a lot of PRs to tend to.
> >
> > I scraped the project's code review comment history, and here are the
> > individuals who have left the most comments on PRs since genesis
> >
> > pitrou                6802
> > wesm                  5023
> > emkornfield           3032
> > bkietz                2834
> > kou                   1489
> > nealrichardson        1439
> > fsaintjacques         1356
> > kszucs                1250
> > alamb                 1133
> > jorisvandenbossche    1094
> > liyafan82              831
> > lidavidm               816
> > westonpace             794
> > xhochy                 770
> > nevi-me                643
> > BryanCutler            639
> > jorgecarleitao         635
> > cpcloud                551
> > sunchao                536
> > ianmcook               499
> >
> > Since we're probably stuck using GitHub to receive code contributions
> > (as opposed to systems — Gerrit is one I'm familiar with — that
> > provide more structure for reviewers to track the patches they "own"
> > as well as the outgoing/incoming state of reviews), I am wondering
> > what kinds of tools we could create to make it easier for maintainers
> > to keep track of PRs they are shepherding through the contribution
> > process. Ideally this wouldn't involve maintainers having to engage in
> > some explicit action like assigning themselves as a PR reviewer.
> >
> > Here's one idea: a web application that displays "your reviews", a
> > table of PRs that you have interacted with in any way (commented, left
> > code review, assigned as reviewer, someone mentioned you, etc.) sorted
> > either by last commit or last comment to assess "freshness". So if you
> > comment on a PR or leave a code review, it will automatically show up
> > in "your reviews". It could also indicate whether there has been
> > activity on the PR since the last time you interacted with it.
> >
> > Having now used the GitHub API to pull comments from PRs for the above
> > analysis, there is certainly enough information available to help
> > create this kind of tool. I'd be willing to contribute to building the
> > backend of such a web application.
> >
> > This is just one idea, but I am curious to hear from others who are
> > spending a lot of time doing code review / PR merging to see what
> > might help them use their time more effectively.
> >
> > Thanks,
> > Wes
> >
>

Re: Improving PR workload management for Arrow maintainers

Posted by Andrew Lamb <al...@influxdata.com>.
The thing that would make me more efficient reviewing PRs is figuring out
which one of the open reviews are ready for additional feedback.

I think the idea of a webapp or something that shows active reviews would
be helpful (though I get most of that from appropriate email filters).

What about a system involving labels (for which there is already a basic
GUI in github)? Something low tech like

(Waiting for Review)
(Addressing Feedback)
(Approved, waiting for Merge)

With maybe some automation prompting people to add the "Waiting on Review"
label when they want feedback

Andrew

On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> I've noted that the volume of PRs for Arrow has been steadily
> increasing (and will likely continue to increase), and while I've
> personally had less time for development / maintenance / code reviews
> over the last year, I would like to have a discussion about what we
> could do to improve our tooling for maintainers to optimize the
> efficiency of time spent tending to the PR queue. In my own
> experience, I have felt that I have wasted a lot of time digging
> around the queue looking for PRs that are awaiting feedback or need to
> be merged.
>
> I note first of all that around 70 out of 173 open PRs have been
> updated in the last 7 days, so while there is some PR staleness, to
> have nearly half of the PRs active is pretty good. That said, ~70
> active PRs is a lot of PRs to tend to.
>
> I scraped the project's code review comment history, and here are the
> individuals who have left the most comments on PRs since genesis
>
> pitrou                6802
> wesm                  5023
> emkornfield           3032
> bkietz                2834
> kou                   1489
> nealrichardson        1439
> fsaintjacques         1356
> kszucs                1250
> alamb                 1133
> jorisvandenbossche    1094
> liyafan82              831
> lidavidm               816
> westonpace             794
> xhochy                 770
> nevi-me                643
> BryanCutler            639
> jorgecarleitao         635
> cpcloud                551
> sunchao                536
> ianmcook               499
>
> Since we're probably stuck using GitHub to receive code contributions
> (as opposed to systems — Gerrit is one I'm familiar with — that
> provide more structure for reviewers to track the patches they "own"
> as well as the outgoing/incoming state of reviews), I am wondering
> what kinds of tools we could create to make it easier for maintainers
> to keep track of PRs they are shepherding through the contribution
> process. Ideally this wouldn't involve maintainers having to engage in
> some explicit action like assigning themselves as a PR reviewer.
>
> Here's one idea: a web application that displays "your reviews", a
> table of PRs that you have interacted with in any way (commented, left
> code review, assigned as reviewer, someone mentioned you, etc.) sorted
> either by last commit or last comment to assess "freshness". So if you
> comment on a PR or leave a code review, it will automatically show up
> in "your reviews". It could also indicate whether there has been
> activity on the PR since the last time you interacted with it.
>
> Having now used the GitHub API to pull comments from PRs for the above
> analysis, there is certainly enough information available to help
> create this kind of tool. I'd be willing to contribute to building the
> backend of such a web application.
>
> This is just one idea, but I am curious to hear from others who are
> spending a lot of time doing code review / PR merging to see what
> might help them use their time more effectively.
>
> Thanks,
> Wes
>