You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by Jungtaek Lim <ka...@gmail.com> on 2016/05/16 02:01:40 UTC

Re: [DISCUSSION] Squashing commits before merging in.

This thread was forgotten, but I would like to discuss this again and reach
final result.

1. I believe learning from other projects is the best way to evaluate our
process and change if others are better.

Let's take a look at commits page on some projects -
Hadoop <https://github.com/apache/hadoop/commits/trunk>, HBase
<https://github.com/apache/hbase/commits/master>, Kafka
<https://github.com/apache/kafka/commits/trunk>, Spark
<https://github.com/apache/spark/commits/master>, Flink
<https://github.com/apache/flink/commits/master>, Calcite
<https://github.com/apache/calcite/commits/master>, Zeppelin (incubator)
<https://github.com/apache/incubator-zeppelin/commits/master>.

You may notice that most of commit logs are started to JIRA ID (which is
what some of us are done implicitly), and there's one commit per one issue,
which may be squashed. (there seems to be some exceptions but seems minor)
Moreover, recently Github includes this feature to UI and many projects are
using it.

2. Since we merge commits, merge commit is placed to last before CHANGELOG
commit, but origin commits could be placed to outside of 1 page of commits
page. IMO, it is not important when contributors are committing their
commits are. Rearrange to be placed to latest are more deterministic. (Btw,
just rebasing also do the trick for this.)

3. It's also great for getting rid of maintaining CHANGELOG by hand. I
really don't like doing this by hand now, and squashed commits are
CHANGELOG. We don't even revert two commits (merge commit / CHANGELOG).

4. If merger squashes commits into one when merging, we can see author and
merger of commits for that commit.

I'm in favor of squashing commits, but some contributors could have hard
time to squash by themselves so I think it would be better to squash in
merging step. (committers are in responsible to squash, which should be
easier with script tool)

2015년 5월 12일 (화) 오전 2:16, Bobby Evans <ev...@yahoo-inc.com.invalid>님이 작성:

> I am fine with doing this, but I also don't see it as that big of a deal.
> Most of the time if I want to cherry pick something I will cherry pick the
> merge commit for the JIRA instead of each individual commit.  The only
> place for me that this is really nice is with blame, where I can see the
> exact JIRA that something is a part of, without having to trace it back to
> the merge commit and JIRA it was a part of. The downside is that now I have
> to run more commands when committing.  So for me, the questions is how
> often do I commit vs cherry pick, and I honestly could go either way.
>
> - Bobby
>
>
>
>      On Friday, May 8, 2015 7:31 PM, Harsha <st...@harsha.io> wrote:
>
>
>  Thanks for the reply.
>
> 1)  Major point is that commit's changeset size will be completely
> relied on
> > JIRA issue's size, and we can have another pain point when changeset
> > becomes huge.
>
>   I am planning adding JIRA title as the commit message. Its what we do
>   any way.
>
> 2)  Additional point is how we leave multiple commits' log messages into
> one.
> > Headline of log can contain useful information which describes code
> > change,
> > and applying this approach improperly we can lose them.
>
> If you are fixing something even if its a small issue, One should file a
> JIRA and fix it send a PR.  This mean each fix is one JIRA even if there
> are multiple commits thats gone in , in the end the patch fixes a JIRA.
> This is not going to apply for bigger contributions like a connector or
> a new component. But for those we can also squash lot of commits instead
> of pulling all of the commit messages into storm.  We can treat these
> contributions bit differently than regular JIRAs.
>
>
> > Furthermore, we're also having huge PR. (for example, STORM-561
> > <https://github.com/apache/storm/pull/546>) I wonder we can apply same
> > approach to this, or go with exception.
>
> Yep we can definitely make exceptions to bigger contributions. As I said
> above we can definitely bring down number of commits for the bigger
> contributions as well.
>
> Thanks,
> Harsha
>
>
> On Fri, May 8, 2015, at 04:19 PM, 임정택 wrote:
> > Hi.
> >
> > FYI, Spring Data Redis used your approach now, so we can take a look.
> > https://github.com/spring-projects/spring-data-redis
> >
> > At first, I like your approach.
> >
> > Major point is that commit's changeset size will be completely relied on
> > JIRA issue's size, and we can have another pain point when changeset
> > becomes huge.
> > I sought some of merged PRs from Storm repo, and their changesets seems
> > small enough to leave just one commit per PR.
> >
> > Additional point is how we leave multiple commits' log messages into one.
> > Headline of log can contain useful information which describes code
> > change,
> > and applying this approach improperly we can lose them.
> >
> > Furthermore, we're also having huge PR. (for example, STORM-561
> > <https://github.com/apache/storm/pull/546>) I wonder we can apply same
> > approach to this, or go with exception.
> >
> > Btw, real pain point from cherry-picking is conflicts.
> > Jedis uses same approach, and we have to believe collaborators and unit
> > tests, and we did some mistakes while cherry-picking and unit tests
> > couldn't catch it.
> > It would be not resolved though we squash commits into one.
> > Have to make cherry-picking branch and post a PR related to merging
> > cherry-picked branch into specific version?
> >
> > Sincerely,
> > Jungtaek Lim (HeartSaVioR)
> >
> >
> > 2015-05-09 6:14 GMT+09:00 Harsha <st...@harsha.io>:
> >
> > > Hi,
> > >      Today we are merging into storm master branch using PRs. These PRs
> > > are usually contain multiple commits on different time lines .
> > > There are few issues with this approach , the major pain point is if we
> > > want cherry-pick a single JIRA current approach makes it harder as we
> have
> > > track down all the commits that went in for a particular JIRA.
> > > I am proposing we should squash these multiple commits into one singe
> > > commit before merging into trunk and associate the STORM JIRA in commit
> > > message.
> > > Let me know what you think.
> > >
> > > Thanks,
> > > Harsha
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Name : 임 정택
> > Blog : http://www.heartsavior.net / http://dev.heartsavior.net
> > Twitter : http://twitter.com/heartsavior
> > LinkedIn : http://www.linkedin.com/in/heartsavior
>
>

Re: [DISCUSSION] Squashing commits before merging in.

Posted by Harsha <st...@harsha.io>.
Thanks Jungtaek for starting this thread. 
Its on of my pet peeves that our commit log is very noisy. It would be
great if we can squash commits into one commit message before merging
the PR. This would keep the commit log clean and if someone wants to
refer to particular JIRA patch its easy to go back to that patch if
we've squashed the commits. Also cherry-picking will be easier .  
    In Kakfa community we use this process and I find it very helpful .
    I changed the kafka-merge-pr tool to work with storm  and PR
    available here https://github.com/apache/storm/pull/1468 . I
    commented out the line that pushes to apache git repo until I get
    some feedback but you can safely run it and see how the final commit
    looks like.

Thanks,
Harsha 

On Sun, May 15, 2016, at 07:04 PM, Jungtaek Lim wrote:
> Moving opinions from other threads,
> 
> Abhishek
> 
> 1* - Squashing commits is more of a guideline. Committers, specifically,
> should ensure that the patches are squashed or volunteer to do that as
> merging otherwise :) These guidelines along with 'how to git squash' can
> be
> added to http://storm.apache.org/contribute/Contributing-to-Storm.html.
> 
> Aaron
> 
> 1* I like the idea of squashing to a single commit and tagging that
> commit
> and PR with the JIRA ID.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 2016년 5월 16일 (월) 오전 11:01, Jungtaek Lim <ka...@gmail.com>님이 작성:
> 
> > This thread was forgotten, but I would like to discuss this again and
> > reach final result.
> >
> > 1. I believe learning from other projects is the best way to evaluate our
> > process and change if others are better.
> >
> > Let's take a look at commits page on some projects -
> > Hadoop <https://github.com/apache/hadoop/commits/trunk>, HBase
> > <https://github.com/apache/hbase/commits/master>, Kafka
> > <https://github.com/apache/kafka/commits/trunk>, Spark
> > <https://github.com/apache/spark/commits/master>, Flink
> > <https://github.com/apache/flink/commits/master>, Calcite
> > <https://github.com/apache/calcite/commits/master>, Zeppelin (incubator)
> > <https://github.com/apache/incubator-zeppelin/commits/master>.
> >
> > You may notice that most of commit logs are started to JIRA ID (which is
> > what some of us are done implicitly), and there's one commit per one issue,
> > which may be squashed. (there seems to be some exceptions but seems minor)
> > Moreover, recently Github includes this feature to UI and many projects
> > are using it.
> >
> > 2. Since we merge commits, merge commit is placed to last before CHANGELOG
> > commit, but origin commits could be placed to outside of 1 page of commits
> > page. IMO, it is not important when contributors are committing their
> > commits are. Rearrange to be placed to latest are more deterministic. (Btw,
> > just rebasing also do the trick for this.)
> >
> > 3. It's also great for getting rid of maintaining CHANGELOG by hand. I
> > really don't like doing this by hand now, and squashed commits are
> > CHANGELOG. We don't even revert two commits (merge commit / CHANGELOG).
> >
> > 4. If merger squashes commits into one when merging, we can see author and
> > merger of commits for that commit.
> >
> > I'm in favor of squashing commits, but some contributors could have hard
> > time to squash by themselves so I think it would be better to squash in
> > merging step. (committers are in responsible to squash, which should be
> > easier with script tool)
> >
> > 2015년 5월 12일 (화) 오전 2:16, Bobby Evans <ev...@yahoo-inc.com.invalid>님이 작성:
> >
> >> I am fine with doing this, but I also don't see it as that big of a
> >> deal.  Most of the time if I want to cherry pick something I will cherry
> >> pick the merge commit for the JIRA instead of each individual commit.  The
> >> only place for me that this is really nice is with blame, where I can see
> >> the exact JIRA that something is a part of, without having to trace it back
> >> to the merge commit and JIRA it was a part of. The downside is that now I
> >> have to run more commands when committing.  So for me, the questions is how
> >> often do I commit vs cherry pick, and I honestly could go either way.
> >>
> >> - Bobby
> >>
> >>
> >>
> >>      On Friday, May 8, 2015 7:31 PM, Harsha <st...@harsha.io> wrote:
> >>
> >>
> >>  Thanks for the reply.
> >>
> >> 1)  Major point is that commit's changeset size will be completely
> >> relied on
> >> > JIRA issue's size, and we can have another pain point when changeset
> >> > becomes huge.
> >>
> >>   I am planning adding JIRA title as the commit message. Its what we do
> >>   any way.
> >>
> >> 2)  Additional point is how we leave multiple commits' log messages into
> >> one.
> >> > Headline of log can contain useful information which describes code
> >> > change,
> >> > and applying this approach improperly we can lose them.
> >>
> >> If you are fixing something even if its a small issue, One should file a
> >> JIRA and fix it send a PR.  This mean each fix is one JIRA even if there
> >> are multiple commits thats gone in , in the end the patch fixes a JIRA.
> >> This is not going to apply for bigger contributions like a connector or
> >> a new component. But for those we can also squash lot of commits instead
> >> of pulling all of the commit messages into storm.  We can treat these
> >> contributions bit differently than regular JIRAs.
> >>
> >>
> >> > Furthermore, we're also having huge PR. (for example, STORM-561
> >> > <https://github.com/apache/storm/pull/546>) I wonder we can apply same
> >> > approach to this, or go with exception.
> >>
> >> Yep we can definitely make exceptions to bigger contributions. As I said
> >> above we can definitely bring down number of commits for the bigger
> >> contributions as well.
> >>
> >> Thanks,
> >> Harsha
> >>
> >>
> >> On Fri, May 8, 2015, at 04:19 PM, 임정택 wrote:
> >> > Hi.
> >> >
> >> > FYI, Spring Data Redis used your approach now, so we can take a look.
> >> > https://github.com/spring-projects/spring-data-redis
> >> >
> >> > At first, I like your approach.
> >> >
> >> > Major point is that commit's changeset size will be completely relied on
> >> > JIRA issue's size, and we can have another pain point when changeset
> >> > becomes huge.
> >> > I sought some of merged PRs from Storm repo, and their changesets seems
> >> > small enough to leave just one commit per PR.
> >> >
> >> > Additional point is how we leave multiple commits' log messages into
> >> one.
> >> > Headline of log can contain useful information which describes code
> >> > change,
> >> > and applying this approach improperly we can lose them.
> >> >
> >> > Furthermore, we're also having huge PR. (for example, STORM-561
> >> > <https://github.com/apache/storm/pull/546>) I wonder we can apply same
> >> > approach to this, or go with exception.
> >> >
> >> > Btw, real pain point from cherry-picking is conflicts.
> >> > Jedis uses same approach, and we have to believe collaborators and unit
> >> > tests, and we did some mistakes while cherry-picking and unit tests
> >> > couldn't catch it.
> >> > It would be not resolved though we squash commits into one.
> >> > Have to make cherry-picking branch and post a PR related to merging
> >> > cherry-picked branch into specific version?
> >> >
> >> > Sincerely,
> >> > Jungtaek Lim (HeartSaVioR)
> >> >
> >> >
> >> > 2015-05-09 6:14 GMT+09:00 Harsha <st...@harsha.io>:
> >> >
> >> > > Hi,
> >> > >      Today we are merging into storm master branch using PRs. These
> >> PRs
> >> > > are usually contain multiple commits on different time lines .
> >> > > There are few issues with this approach , the major pain point is if
> >> we
> >> > > want cherry-pick a single JIRA current approach makes it harder as we
> >> have
> >> > > track down all the commits that went in for a particular JIRA.
> >> > > I am proposing we should squash these multiple commits into one singe
> >> > > commit before merging into trunk and associate the STORM JIRA in
> >> commit
> >> > > message.
> >> > > Let me know what you think.
> >> > >
> >> > > Thanks,
> >> > > Harsha
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Name : 임 정택
> >> > Blog : http://www.heartsavior.net / http://dev.heartsavior.net
> >> > Twitter : http://twitter.com/heartsavior
> >> > LinkedIn : http://www.linkedin.com/in/heartsavior
> >>
> >>
> >
> >

Re: [DISCUSSION] Squashing commits before merging in.

Posted by Jungtaek Lim <ka...@gmail.com>.
Moving opinions from other threads,

Abhishek

1* - Squashing commits is more of a guideline. Committers, specifically,
should ensure that the patches are squashed or volunteer to do that as
merging otherwise :) These guidelines along with 'how to git squash' can be
added to http://storm.apache.org/contribute/Contributing-to-Storm.html.

Aaron

1* I like the idea of squashing to a single commit and tagging that commit
and PR with the JIRA ID.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 5월 16일 (월) 오전 11:01, Jungtaek Lim <ka...@gmail.com>님이 작성:

> This thread was forgotten, but I would like to discuss this again and
> reach final result.
>
> 1. I believe learning from other projects is the best way to evaluate our
> process and change if others are better.
>
> Let's take a look at commits page on some projects -
> Hadoop <https://github.com/apache/hadoop/commits/trunk>, HBase
> <https://github.com/apache/hbase/commits/master>, Kafka
> <https://github.com/apache/kafka/commits/trunk>, Spark
> <https://github.com/apache/spark/commits/master>, Flink
> <https://github.com/apache/flink/commits/master>, Calcite
> <https://github.com/apache/calcite/commits/master>, Zeppelin (incubator)
> <https://github.com/apache/incubator-zeppelin/commits/master>.
>
> You may notice that most of commit logs are started to JIRA ID (which is
> what some of us are done implicitly), and there's one commit per one issue,
> which may be squashed. (there seems to be some exceptions but seems minor)
> Moreover, recently Github includes this feature to UI and many projects
> are using it.
>
> 2. Since we merge commits, merge commit is placed to last before CHANGELOG
> commit, but origin commits could be placed to outside of 1 page of commits
> page. IMO, it is not important when contributors are committing their
> commits are. Rearrange to be placed to latest are more deterministic. (Btw,
> just rebasing also do the trick for this.)
>
> 3. It's also great for getting rid of maintaining CHANGELOG by hand. I
> really don't like doing this by hand now, and squashed commits are
> CHANGELOG. We don't even revert two commits (merge commit / CHANGELOG).
>
> 4. If merger squashes commits into one when merging, we can see author and
> merger of commits for that commit.
>
> I'm in favor of squashing commits, but some contributors could have hard
> time to squash by themselves so I think it would be better to squash in
> merging step. (committers are in responsible to squash, which should be
> easier with script tool)
>
> 2015년 5월 12일 (화) 오전 2:16, Bobby Evans <ev...@yahoo-inc.com.invalid>님이 작성:
>
>> I am fine with doing this, but I also don't see it as that big of a
>> deal.  Most of the time if I want to cherry pick something I will cherry
>> pick the merge commit for the JIRA instead of each individual commit.  The
>> only place for me that this is really nice is with blame, where I can see
>> the exact JIRA that something is a part of, without having to trace it back
>> to the merge commit and JIRA it was a part of. The downside is that now I
>> have to run more commands when committing.  So for me, the questions is how
>> often do I commit vs cherry pick, and I honestly could go either way.
>>
>> - Bobby
>>
>>
>>
>>      On Friday, May 8, 2015 7:31 PM, Harsha <st...@harsha.io> wrote:
>>
>>
>>  Thanks for the reply.
>>
>> 1)  Major point is that commit's changeset size will be completely
>> relied on
>> > JIRA issue's size, and we can have another pain point when changeset
>> > becomes huge.
>>
>>   I am planning adding JIRA title as the commit message. Its what we do
>>   any way.
>>
>> 2)  Additional point is how we leave multiple commits' log messages into
>> one.
>> > Headline of log can contain useful information which describes code
>> > change,
>> > and applying this approach improperly we can lose them.
>>
>> If you are fixing something even if its a small issue, One should file a
>> JIRA and fix it send a PR.  This mean each fix is one JIRA even if there
>> are multiple commits thats gone in , in the end the patch fixes a JIRA.
>> This is not going to apply for bigger contributions like a connector or
>> a new component. But for those we can also squash lot of commits instead
>> of pulling all of the commit messages into storm.  We can treat these
>> contributions bit differently than regular JIRAs.
>>
>>
>> > Furthermore, we're also having huge PR. (for example, STORM-561
>> > <https://github.com/apache/storm/pull/546>) I wonder we can apply same
>> > approach to this, or go with exception.
>>
>> Yep we can definitely make exceptions to bigger contributions. As I said
>> above we can definitely bring down number of commits for the bigger
>> contributions as well.
>>
>> Thanks,
>> Harsha
>>
>>
>> On Fri, May 8, 2015, at 04:19 PM, 임정택 wrote:
>> > Hi.
>> >
>> > FYI, Spring Data Redis used your approach now, so we can take a look.
>> > https://github.com/spring-projects/spring-data-redis
>> >
>> > At first, I like your approach.
>> >
>> > Major point is that commit's changeset size will be completely relied on
>> > JIRA issue's size, and we can have another pain point when changeset
>> > becomes huge.
>> > I sought some of merged PRs from Storm repo, and their changesets seems
>> > small enough to leave just one commit per PR.
>> >
>> > Additional point is how we leave multiple commits' log messages into
>> one.
>> > Headline of log can contain useful information which describes code
>> > change,
>> > and applying this approach improperly we can lose them.
>> >
>> > Furthermore, we're also having huge PR. (for example, STORM-561
>> > <https://github.com/apache/storm/pull/546>) I wonder we can apply same
>> > approach to this, or go with exception.
>> >
>> > Btw, real pain point from cherry-picking is conflicts.
>> > Jedis uses same approach, and we have to believe collaborators and unit
>> > tests, and we did some mistakes while cherry-picking and unit tests
>> > couldn't catch it.
>> > It would be not resolved though we squash commits into one.
>> > Have to make cherry-picking branch and post a PR related to merging
>> > cherry-picked branch into specific version?
>> >
>> > Sincerely,
>> > Jungtaek Lim (HeartSaVioR)
>> >
>> >
>> > 2015-05-09 6:14 GMT+09:00 Harsha <st...@harsha.io>:
>> >
>> > > Hi,
>> > >      Today we are merging into storm master branch using PRs. These
>> PRs
>> > > are usually contain multiple commits on different time lines .
>> > > There are few issues with this approach , the major pain point is if
>> we
>> > > want cherry-pick a single JIRA current approach makes it harder as we
>> have
>> > > track down all the commits that went in for a particular JIRA.
>> > > I am proposing we should squash these multiple commits into one singe
>> > > commit before merging into trunk and associate the STORM JIRA in
>> commit
>> > > message.
>> > > Let me know what you think.
>> > >
>> > > Thanks,
>> > > Harsha
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> > --
>> > Name : 임 정택
>> > Blog : http://www.heartsavior.net / http://dev.heartsavior.net
>> > Twitter : http://twitter.com/heartsavior
>> > LinkedIn : http://www.linkedin.com/in/heartsavior
>>
>>
>
>