You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@daffodil.apache.org by "Beckerle, Mike" <mb...@owlcyberdefense.com> on 2021/06/24 19:40:58 UTC

So many automated PRs... do we need to count them separately?

One of the metrics of project health at the ASF is number of PRs and commits on projects.

Ours have been massively inflated by these scalabot and dependbot PRs. (dependbot is new, but I was already observing this just from the scala update bot).

This isn't terrible, and presumably as other projects adopt this improvement to the SDLC all the numbers will adjust upward, with expectations adjusting upward similarly.

I just wanted everyone to understand that "non-automated PRs" is perhaps a future metric of note, and that we're seeing an inflated number of PRs and commits and emails now due to these bots.

I believe that these improve the quality of our software, and reduce maintenance burdens on the team, so I hope more projects adopt this.

I just wanted everyone to understand that there are some odd ASF "community health" metric implications of this that I will raise in the next Apache Daffodil Board report, just to advise them that our project (like others) is experiencing this big flurry and new steady state, of automated PRs.

I doubt this is news, but it's worth mentioning given that we're a small new project and the instant growth due to these bots is a one-time transient not reflective of (and in fact overwhelming) our organic community growth, which is non-zero, but slow. (I'm ok with it, thank you new contributors!)

I am interested in people's thoughts about this notion of counting automated PRs separately from human-originated PRs.

I am also interested in whether people find this flurry of constant bot activity disruptive. I admit I find it so. I am going to need to create email rules to segregate this email traffic into folders so they're not in my daily view, and I wonder if we need to have an informal policy that people aren't expected to respond/review these except but once a week/month or some such.

Thoughts welcome.

-mikeb

Mike Beckerle | Principal Engineer

[cid:5d17796e-a90b-49ae-aede-06b717ee9a7a]

mbeckerle@owlcyberdefense.com<ma...@owlcyberdefense.com>

P +1-781-330-0412

Re: So many automated PRs... do we need to count them separately?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

This link you sent:

https://github.com/apache/daffodil/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc

is really useful. Like I noticed all my WIP PRs are down at the bottom of the list 🙁

Re: So many automated PRs... do we need to count them separately?

Posted by Steve Lawrence <sl...@apache.org>.

I'd agree that counting the creation of autogenerated PR's as a sign of
an active community probably doesn't make sense.

However, I would argue that counting those bot PR's that are merged
isn't totally unresaonable. We don't just automatically click merge on
these bot PRs. Reviewing them usually requires a non trivial amount of
time to scrutinize the changelog or commit history, licenses, etc. to
make sure that that the change should and can be merged. And it's not
uncommon for these bot PRs to also require actual code changes due to
failing to compile or causing tests to fail, which requires additional
work. So there is definitely developer time required to merge this.

Because of this, I'd say that if these bot PR's aren't getting merged,
that's probably a sign of an unhealthy community, and so it's probably
reasonable to be in the statistics somehow.

Perhaps we need to say something like

  X pull requests opened (Y from non-bots), Z pull requests merged

Or alternatively,

  Human pull requests: X opened, Y merged
  Bot pull requests: A opened, B merged

Then people can make their own judgements about health with all that
information, and we can more clearly see if we're slacking on a certain
kinds of PRs.

Though, eventually I would hope that the human pull requests outnumber
the bot pull requests but enough that those bot PR's are just a blip.
We're probably a ways off from that though.

As to the flurry, I don't personally find it too distracting, but I do
tend to just delete the emails and then I'm careful to check for updated
pull requests when I have the time. I've found this link to be useful
since it sorts PR's by most recently updated:

https://github.com/apache/daffodil/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc

This way I can quickly tell if there's anything worth looking at when I
have some down time.

I would also hope that now that everything is updated, we should expect
to see bot PRs somewhat less frequently. We definitely went through a
period of lots of bot PRs which was a bit of a slog to get through.

- Steve

On 6/24/21 3:40 PM, Beckerle, Mike wrote:
> One of the metrics of project health at the ASF is number of PRs and commits on 
> projects.
> 
> Ours have been massively inflated by these scalabot and dependbot PRs. 
> (dependbot is new, but I was already observing this just from the scala update 
> bot).
> 
> This isn't terrible, and presumably as other projects adopt this improvement to 
> the SDLC all the numbers will adjust upward, with expectations adjusting upward 
> similarly.
> 
> I just wanted everyone to understand that "non-automated PRs" is perhaps a 
> future metric of note, and that we're seeing an inflated number of PRs and 
> commits and emails now due to these bots.
> 
> I believe that these improve the quality of our software, and reduce maintenance 
> burdens on the team, so I hope more projects adopt this.
> 
> I just wanted everyone to understand that there are some odd ASF "community 
> health" metric implications of this that I will raise in the next Apache 
> Daffodil Board report, just to advise them that our project (like others) is 
> experiencing this big flurry and new steady state, of automated PRs.
> 
> I doubt this is news, but it's worth mentioning given that we're a small new 
> project and the instant growth due to these bots is a one-time transient not 
> reflective of (and in fact overwhelming) our organic community growth, which is 
> non-zero, but slow. (I'm ok with it, thank you new contributors!)
> 
> I am interested in people's thoughts about this notion of counting automated PRs 
> separately from human-originated PRs.
> 
> I am also interested in whether people find this flurry of constant bot activity 
> disruptive. I admit I find it so. I am going to need to create email rules to 
> segregate this email traffic into folders so they're not in my daily view, and I 
> wonder if we need to have an informal policy that people aren't expected to 
> respond/review these except but once a week/month or some such.
> 
> Thoughts welcome.
> 
> -mikeb
> 
> 
> 
> Mike Beckerle | Principal Engineer
> 
> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
> 
> P +1-781-330-0412
>