You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@reef.apache.org by Dhruv Mahajan <dh...@gmail.com> on 2015/08/19 22:23:13 UTC

Agility vs. interpretability with JIRAs

Hi

I am sending this email merely to invoke a discussion and see what other
people feel about the subject.

I understand it is generally discouraged to have big commits with say 20-30
files or more to review for the purpose of both reviewing as well as
maintaining interpretability for user looking at it in future and one
should not do it. It is encouraged to divide the whole big commit in to sub
Jiras with each one being a logical sub-unit. But many times especially
with REEF being in its early stages and application level code being
starting to develop on top of it, this means that each application like
IMRU potentially can have many sub JIRAS each with 3-4 files. In my
experience each JIRA has an overhead, in terms of time interval with pull
request being created and somebody taking it up for review which is
understandable since many of us work on multiple things and there are few
reviewers at this point and then again same gap happens every time a review
is done and changes are made.

I want to know if other people also feel that there is a trade-off here
between agility and maintaining logs properly especially with things being
in early stages and dynamic. If yes, I want to know what people feel is the
sweet spot. Does it lie on extreme or something in between?

Dhruv

Re: Agility vs. interpretability with JIRAs

Posted by Markus Weimer <ma...@weimo.de>.

Hi,

+1 on [MINOR]

We need to figure out when to use it, though. GitHub can't provide the
same level of durable paper trail the ASF can. Hence, one rule of thumb
would be: If it should go into release notes, it can't be a [MINOR].
This leaves reformatting, trivial comment updates and small
implementation changes that don't change behavior (or, worse, APIs).

Is that a reasonable heuristic for the use of [MINOR]?

Markus

On 2015-08-24 00:14, Byung-Gon Chun wrote:
> I strongly agree that smaller PRs are better for code review and
> maintainability.
> But I think the overhead of creating a small PR for minor code changes is
> high.
> The [MINOR] tag Brian suggested seems to be a reasonable solution. I'd love
> to hear what others think.
> 
> -Gon
> 
> On Mon, Aug 24, 2015 at 2:00 PM, Brian Cho <ch...@gmail.com> wrote:
> 
>> I agree with Markus’s first message that generally smaller == more
>> reviewable for JIRAs. Taking the bait on the second message:
>>
>> I’ve seen some back-and-forth about making minor code formatting changes
>> within a bigger PR/JIRA. I agree with Markus’s (1), that making such
>> changes can confuse things for the reviewer. However, the “tax” of creating
>> a new JIRA is large for such small changes. I worry that the cost-reward is
>> such that these issues will never be dealt with.
>>
>> I like Spark’s solution to this. Surfing through the commit log, you can
>> see commits tagged as [MINOR] (e.g., [1]). These PRs keep JIRA completely
>> out of the equation. I think this will reduce the temptation to make such
>> changes within a real JIRA issue.
>>
>> Do other folks also see the need for a JIRA-less [MINOR] tag?
>>
>> [1] https://github.com/apache/spark/pull/8265 /
>> https://github.com/apache/spark/pull/8265
>>
>>
>> On Mon, Aug 24, 2015 at 7:20 AM, Markus Weimer <ma...@weimo.de> wrote:
>>
>>> Uhm, I did not mean to preempt this discussion :-) I think that we in
>>> fact have a lot of ceremonial overhead in our current JIRA<->GitHub
>>> approach. Maybe we can automate some of it? Or just stop doing other
>> parts?
>>>
>>> Markus
>>>
>>>
>>> On 2015-08-19 14:42, Markus Weimer wrote:
>>>> Hi,
>>>>
>>>> On 2015-08-19 13:23, Dhruv Mahajan wrote:
>>>>> I want to know if other people also feel that there is a trade-off
>>>>> here between agility and maintaining logs properly especially with
>>>>> things being in early stages and dynamic.
>>>>
>>>> I agree that there is a trade-off here, but I don't agree that it is
>>>> between agility and maintaining logs:
>>>>
>>>> As you said, each PR comes with a fixed effort: The contributor has to
>>>> prepare a neat branch, write a concise and meaningful commit
>>>> message and do the JIRA and GitHub stuff. The committer has to build,
>>>> test, rebase and merge the code as well as edit the commit message
>>>> and JIRA. My guestimate is that this effort is ~30 minutes each for
>>>> contributor and committer.
>>>>
>>>> Just considering these fixed, size-independent costs of a PR/JIRA,
>> there
>>>> should be as few JIRAs and PRs as possible to get functionality in. In
>>>> turn, this would suggest making the largest possible PRs. The
>>>> commit history would be clean and easy to follow. I don't see
>>>> "maintaining proper logs" as a major obstacle here.
>>>>
>>>> However, there are efforts which are variable with the size and, more
>>>> importantly, complexity of a code change:
>>>>
>>>>   * Trivially obvious, bigger and more complex PRs take more effort to
>>>> review. In my experience, that increase in effort is super-linear:
>>>> Larger PRs need more back and forth, each of which comes with the kind
>>>> of latency you mentioned. Anecdotely, the longest standing PRs have
>> been
>>>> the most complex. Shimoga comes to mind, but also the initial reorgs of
>>>> the .NET code.
>>>>
>>>>   * Bigger PRs have a greater chance of merge conflicts with
>>>> other (big) PRs in the pipeline. This is especially bad for
>>>> project-structure changes on the .NET side: the .sln file is *binary*
>>>> from git's perspective and all merges are manual.
>>>>
>>>>   * Bigger PRs are less likely to be reviewed. The time commitment for
>> a
>>>> review is substantial. And given other options, the larger PR will be
>>>> ignored. Case in point: Andrew's #375 has been skipped over several
>>>> times now in favor of smaller changes. Yes, also by me. It is just much
>>>> easier to squeeze half an hour of code reviews in than an hour or more.
>>>> This compounds with the other effects in a way that committing bigger
>>>> and more complex PRs takes time very much super-linear in size. <-- we
>>>> should actually measure that.
>>>>
>>>>> If yes, I want to know what people feel is the sweet spot. Does it
>>>>> lie on extreme or something in between?
>>>>
>>>> Hard to have an opinion without knowing what the extremes are. If less
>>>> than 10 lines of change are one extreme and one commit per release are
>>>> the other, surely there is a sweet spot in between :-)
>>>>
>>>> Seriously, though, I play this by ear. A couple of rules I use when
>>>> creating or reviewing PRs are:
>>>>
>>>>   (1) Don't mix formatting and functionality changes. It makes
>> debugging
>>>> a mess.
>>>>
>>>>   (2) Don't add new C# projects and their contents in one go. Adding
>>>> projects is fragile and needs to take priority over other PRs to avoid
>>>> merge conflicts.
>>>>
>>>> Beyond that, the hunches become more soft:
>>>>
>>>>   * If a PR has sat untouched for a week, I sometimes do a review with
>>>> the sole intent of suggesting a split. That is to increase its chances
>>>> to be reviewed for real.
>>>>
>>>>   * Sometimes, PRs solve more than one significant problem. Which might
>>>> make sense to split them.
>>>>
>>>> Sorry for the long email, but this is a complex beast :-)
>>>>
>>>> Markus
>>>>
>>>>
>>>>
>>>
>>
> 
> 
>

Re: Agility vs. interpretability with JIRAs

Posted by Byung-Gon Chun <bg...@gmail.com>.

I strongly agree that smaller PRs are better for code review and
maintainability.
But I think the overhead of creating a small PR for minor code changes is
high.
The [MINOR] tag Brian suggested seems to be a reasonable solution. I'd love
to hear what others think.

-Gon

On Mon, Aug 24, 2015 at 2:00 PM, Brian Cho <ch...@gmail.com> wrote:

> I agree with Markus’s first message that generally smaller == more
> reviewable for JIRAs. Taking the bait on the second message:
>
> I’ve seen some back-and-forth about making minor code formatting changes
> within a bigger PR/JIRA. I agree with Markus’s (1), that making such
> changes can confuse things for the reviewer. However, the “tax” of creating
> a new JIRA is large for such small changes. I worry that the cost-reward is
> such that these issues will never be dealt with.
>
> I like Spark’s solution to this. Surfing through the commit log, you can
> see commits tagged as [MINOR] (e.g., [1]). These PRs keep JIRA completely
> out of the equation. I think this will reduce the temptation to make such
> changes within a real JIRA issue.
>
> Do other folks also see the need for a JIRA-less [MINOR] tag?
>
> [1] https://github.com/apache/spark/pull/8265 /
> https://github.com/apache/spark/pull/8265
>
>
> On Mon, Aug 24, 2015 at 7:20 AM, Markus Weimer <ma...@weimo.de> wrote:
>
> > Uhm, I did not mean to preempt this discussion :-) I think that we in
> > fact have a lot of ceremonial overhead in our current JIRA<->GitHub
> > approach. Maybe we can automate some of it? Or just stop doing other
> parts?
> >
> > Markus
> >
> >
> > On 2015-08-19 14:42, Markus Weimer wrote:
> > > Hi,
> > >
> > > On 2015-08-19 13:23, Dhruv Mahajan wrote:
> > >> I want to know if other people also feel that there is a trade-off
> > >> here between agility and maintaining logs properly especially with
> > >> things being in early stages and dynamic.
> > >
> > > I agree that there is a trade-off here, but I don't agree that it is
> > > between agility and maintaining logs:
> > >
> > > As you said, each PR comes with a fixed effort: The contributor has to
> > > prepare a neat branch, write a concise and meaningful commit
> > > message and do the JIRA and GitHub stuff. The committer has to build,
> > > test, rebase and merge the code as well as edit the commit message
> > > and JIRA. My guestimate is that this effort is ~30 minutes each for
> > > contributor and committer.
> > >
> > > Just considering these fixed, size-independent costs of a PR/JIRA,
> there
> > > should be as few JIRAs and PRs as possible to get functionality in. In
> > > turn, this would suggest making the largest possible PRs. The
> > > commit history would be clean and easy to follow. I don't see
> > > "maintaining proper logs" as a major obstacle here.
> > >
> > > However, there are efforts which are variable with the size and, more
> > > importantly, complexity of a code change:
> > >
> > >   * Trivially obvious, bigger and more complex PRs take more effort to
> > > review. In my experience, that increase in effort is super-linear:
> > > Larger PRs need more back and forth, each of which comes with the kind
> > > of latency you mentioned. Anecdotely, the longest standing PRs have
> been
> > > the most complex. Shimoga comes to mind, but also the initial reorgs of
> > > the .NET code.
> > >
> > >   * Bigger PRs have a greater chance of merge conflicts with
> > > other (big) PRs in the pipeline. This is especially bad for
> > > project-structure changes on the .NET side: the .sln file is *binary*
> > > from git's perspective and all merges are manual.
> > >
> > >   * Bigger PRs are less likely to be reviewed. The time commitment for
> a
> > > review is substantial. And given other options, the larger PR will be
> > > ignored. Case in point: Andrew's #375 has been skipped over several
> > > times now in favor of smaller changes. Yes, also by me. It is just much
> > > easier to squeeze half an hour of code reviews in than an hour or more.
> > > This compounds with the other effects in a way that committing bigger
> > > and more complex PRs takes time very much super-linear in size. <-- we
> > > should actually measure that.
> > >
> > >> If yes, I want to know what people feel is the sweet spot. Does it
> > >> lie on extreme or something in between?
> > >
> > > Hard to have an opinion without knowing what the extremes are. If less
> > > than 10 lines of change are one extreme and one commit per release are
> > > the other, surely there is a sweet spot in between :-)
> > >
> > > Seriously, though, I play this by ear. A couple of rules I use when
> > > creating or reviewing PRs are:
> > >
> > >   (1) Don't mix formatting and functionality changes. It makes
> debugging
> > > a mess.
> > >
> > >   (2) Don't add new C# projects and their contents in one go. Adding
> > > projects is fragile and needs to take priority over other PRs to avoid
> > > merge conflicts.
> > >
> > > Beyond that, the hunches become more soft:
> > >
> > >   * If a PR has sat untouched for a week, I sometimes do a review with
> > > the sole intent of suggesting a split. That is to increase its chances
> > > to be reviewed for real.
> > >
> > >   * Sometimes, PRs solve more than one significant problem. Which might
> > > make sense to split them.
> > >
> > > Sorry for the long email, but this is a complex beast :-)
> > >
> > > Markus
> > >
> > >
> > >
> >
>



-- 
Byung-Gon Chun

Re: Agility vs. interpretability with JIRAs

Posted by Brian Cho <ch...@gmail.com>.

I agree with Markus’s first message that generally smaller == more
reviewable for JIRAs. Taking the bait on the second message:

I’ve seen some back-and-forth about making minor code formatting changes
within a bigger PR/JIRA. I agree with Markus’s (1), that making such
changes can confuse things for the reviewer. However, the “tax” of creating
a new JIRA is large for such small changes. I worry that the cost-reward is
such that these issues will never be dealt with.

I like Spark’s solution to this. Surfing through the commit log, you can
see commits tagged as [MINOR] (e.g., [1]). These PRs keep JIRA completely
out of the equation. I think this will reduce the temptation to make such
changes within a real JIRA issue.

Do other folks also see the need for a JIRA-less [MINOR] tag?

[1] https://github.com/apache/spark/pull/8265 /
https://github.com/apache/spark/pull/8265


On Mon, Aug 24, 2015 at 7:20 AM, Markus Weimer <ma...@weimo.de> wrote:

> Uhm, I did not mean to preempt this discussion :-) I think that we in
> fact have a lot of ceremonial overhead in our current JIRA<->GitHub
> approach. Maybe we can automate some of it? Or just stop doing other parts?
>
> Markus
>
>
> On 2015-08-19 14:42, Markus Weimer wrote:
> > Hi,
> >
> > On 2015-08-19 13:23, Dhruv Mahajan wrote:
> >> I want to know if other people also feel that there is a trade-off
> >> here between agility and maintaining logs properly especially with
> >> things being in early stages and dynamic.
> >
> > I agree that there is a trade-off here, but I don't agree that it is
> > between agility and maintaining logs:
> >
> > As you said, each PR comes with a fixed effort: The contributor has to
> > prepare a neat branch, write a concise and meaningful commit
> > message and do the JIRA and GitHub stuff. The committer has to build,
> > test, rebase and merge the code as well as edit the commit message
> > and JIRA. My guestimate is that this effort is ~30 minutes each for
> > contributor and committer.
> >
> > Just considering these fixed, size-independent costs of a PR/JIRA, there
> > should be as few JIRAs and PRs as possible to get functionality in. In
> > turn, this would suggest making the largest possible PRs. The
> > commit history would be clean and easy to follow. I don't see
> > "maintaining proper logs" as a major obstacle here.
> >
> > However, there are efforts which are variable with the size and, more
> > importantly, complexity of a code change:
> >
> >   * Trivially obvious, bigger and more complex PRs take more effort to
> > review. In my experience, that increase in effort is super-linear:
> > Larger PRs need more back and forth, each of which comes with the kind
> > of latency you mentioned. Anecdotely, the longest standing PRs have been
> > the most complex. Shimoga comes to mind, but also the initial reorgs of
> > the .NET code.
> >
> >   * Bigger PRs have a greater chance of merge conflicts with
> > other (big) PRs in the pipeline. This is especially bad for
> > project-structure changes on the .NET side: the .sln file is *binary*
> > from git's perspective and all merges are manual.
> >
> >   * Bigger PRs are less likely to be reviewed. The time commitment for a
> > review is substantial. And given other options, the larger PR will be
> > ignored. Case in point: Andrew's #375 has been skipped over several
> > times now in favor of smaller changes. Yes, also by me. It is just much
> > easier to squeeze half an hour of code reviews in than an hour or more.
> > This compounds with the other effects in a way that committing bigger
> > and more complex PRs takes time very much super-linear in size. <-- we
> > should actually measure that.
> >
> >> If yes, I want to know what people feel is the sweet spot. Does it
> >> lie on extreme or something in between?
> >
> > Hard to have an opinion without knowing what the extremes are. If less
> > than 10 lines of change are one extreme and one commit per release are
> > the other, surely there is a sweet spot in between :-)
> >
> > Seriously, though, I play this by ear. A couple of rules I use when
> > creating or reviewing PRs are:
> >
> >   (1) Don't mix formatting and functionality changes. It makes debugging
> > a mess.
> >
> >   (2) Don't add new C# projects and their contents in one go. Adding
> > projects is fragile and needs to take priority over other PRs to avoid
> > merge conflicts.
> >
> > Beyond that, the hunches become more soft:
> >
> >   * If a PR has sat untouched for a week, I sometimes do a review with
> > the sole intent of suggesting a split. That is to increase its chances
> > to be reviewed for real.
> >
> >   * Sometimes, PRs solve more than one significant problem. Which might
> > make sense to split them.
> >
> > Sorry for the long email, but this is a complex beast :-)
> >
> > Markus
> >
> >
> >
>

Re: Agility vs. interpretability with JIRAs

Posted by Markus Weimer <ma...@weimo.de>.

Uhm, I did not mean to preempt this discussion :-) I think that we in
fact have a lot of ceremonial overhead in our current JIRA<->GitHub
approach. Maybe we can automate some of it? Or just stop doing other parts?

Markus


On 2015-08-19 14:42, Markus Weimer wrote:
> Hi,
> 
> On 2015-08-19 13:23, Dhruv Mahajan wrote:
>> I want to know if other people also feel that there is a trade-off 
>> here between agility and maintaining logs properly especially with 
>> things being in early stages and dynamic.
> 
> I agree that there is a trade-off here, but I don't agree that it is
> between agility and maintaining logs:
> 
> As you said, each PR comes with a fixed effort: The contributor has to
> prepare a neat branch, write a concise and meaningful commit
> message and do the JIRA and GitHub stuff. The committer has to build,
> test, rebase and merge the code as well as edit the commit message
> and JIRA. My guestimate is that this effort is ~30 minutes each for
> contributor and committer.
> 
> Just considering these fixed, size-independent costs of a PR/JIRA, there
> should be as few JIRAs and PRs as possible to get functionality in. In
> turn, this would suggest making the largest possible PRs. The
> commit history would be clean and easy to follow. I don't see
> "maintaining proper logs" as a major obstacle here.
> 
> However, there are efforts which are variable with the size and, more
> importantly, complexity of a code change:
> 
>   * Trivially obvious, bigger and more complex PRs take more effort to
> review. In my experience, that increase in effort is super-linear:
> Larger PRs need more back and forth, each of which comes with the kind
> of latency you mentioned. Anecdotely, the longest standing PRs have been
> the most complex. Shimoga comes to mind, but also the initial reorgs of
> the .NET code.
> 
>   * Bigger PRs have a greater chance of merge conflicts with
> other (big) PRs in the pipeline. This is especially bad for
> project-structure changes on the .NET side: the .sln file is *binary*
> from git's perspective and all merges are manual.
> 
>   * Bigger PRs are less likely to be reviewed. The time commitment for a
> review is substantial. And given other options, the larger PR will be
> ignored. Case in point: Andrew's #375 has been skipped over several
> times now in favor of smaller changes. Yes, also by me. It is just much
> easier to squeeze half an hour of code reviews in than an hour or more.
> This compounds with the other effects in a way that committing bigger
> and more complex PRs takes time very much super-linear in size. <-- we
> should actually measure that.
> 
>> If yes, I want to know what people feel is the sweet spot. Does it 
>> lie on extreme or something in between?
> 
> Hard to have an opinion without knowing what the extremes are. If less
> than 10 lines of change are one extreme and one commit per release are
> the other, surely there is a sweet spot in between :-)
> 
> Seriously, though, I play this by ear. A couple of rules I use when
> creating or reviewing PRs are:
> 
>   (1) Don't mix formatting and functionality changes. It makes debugging
> a mess.
> 
>   (2) Don't add new C# projects and their contents in one go. Adding
> projects is fragile and needs to take priority over other PRs to avoid
> merge conflicts.
> 
> Beyond that, the hunches become more soft:
> 
>   * If a PR has sat untouched for a week, I sometimes do a review with
> the sole intent of suggesting a split. That is to increase its chances
> to be reviewed for real.
> 
>   * Sometimes, PRs solve more than one significant problem. Which might
> make sense to split them.
> 
> Sorry for the long email, but this is a complex beast :-)
> 
> Markus
> 
> 
>

Re: Agility vs. interpretability with JIRAs

Posted by Markus Weimer <ma...@weimo.de>.

Hi,

On 2015-08-19 13:23, Dhruv Mahajan wrote:
> I want to know if other people also feel that there is a trade-off 
> here between agility and maintaining logs properly especially with 
> things being in early stages and dynamic.

I agree that there is a trade-off here, but I don't agree that it is
between agility and maintaining logs:

As you said, each PR comes with a fixed effort: The contributor has to
prepare a neat branch, write a concise and meaningful commit
message and do the JIRA and GitHub stuff. The committer has to build,
test, rebase and merge the code as well as edit the commit message
and JIRA. My guestimate is that this effort is ~30 minutes each for
contributor and committer.

Just considering these fixed, size-independent costs of a PR/JIRA, there
should be as few JIRAs and PRs as possible to get functionality in. In
turn, this would suggest making the largest possible PRs. The
commit history would be clean and easy to follow. I don't see
"maintaining proper logs" as a major obstacle here.

However, there are efforts which are variable with the size and, more
importantly, complexity of a code change:

  * Trivially obvious, bigger and more complex PRs take more effort to
review. In my experience, that increase in effort is super-linear:
Larger PRs need more back and forth, each of which comes with the kind
of latency you mentioned. Anecdotely, the longest standing PRs have been
the most complex. Shimoga comes to mind, but also the initial reorgs of
the .NET code.

  * Bigger PRs have a greater chance of merge conflicts with
other (big) PRs in the pipeline. This is especially bad for
project-structure changes on the .NET side: the .sln file is *binary*
from git's perspective and all merges are manual.

  * Bigger PRs are less likely to be reviewed. The time commitment for a
review is substantial. And given other options, the larger PR will be
ignored. Case in point: Andrew's #375 has been skipped over several
times now in favor of smaller changes. Yes, also by me. It is just much
easier to squeeze half an hour of code reviews in than an hour or more.
This compounds with the other effects in a way that committing bigger
and more complex PRs takes time very much super-linear in size. <-- we
should actually measure that.

> If yes, I want to know what people feel is the sweet spot. Does it 
> lie on extreme or something in between?

Hard to have an opinion without knowing what the extremes are. If less
than 10 lines of change are one extreme and one commit per release are
the other, surely there is a sweet spot in between :-)

Seriously, though, I play this by ear. A couple of rules I use when
creating or reviewing PRs are:

  (1) Don't mix formatting and functionality changes. It makes debugging
a mess.

  (2) Don't add new C# projects and their contents in one go. Adding
projects is fragile and needs to take priority over other PRs to avoid
merge conflicts.

Beyond that, the hunches become more soft:

  * If a PR has sat untouched for a week, I sometimes do a review with
the sole intent of suggesting a split. That is to increase its chances
to be reviewed for real.

  * Sometimes, PRs solve more than one significant problem. Which might
make sense to split them.

Sorry for the long email, but this is a complex beast :-)

Markus