You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Steve Loughran <st...@hortonworks.com> on 2015/09/12 20:03:26 UTC

CHANGES.TXT

I've just been trying to get CHANGES.TXT between branch-2 and trunk more in sync, so that cherry picking patches from branch-2 up to trunk is more reliable.

Once you look closely , you see it is a mess, specifically:

trunk/CHANGES.TXT declares things as in trunk only yet which are in branch-2 and/or actual releases


What to do?

1. audit trunk/CHANGES.TXT against branch-2/CHANGES.TXT; anything in branch-2's (i.e. to come in 2.8) is placed into trunk at that location; the "new in trunk" runk's version removed 

2. go to JIRA-generated change logs. Though for that to be reliable, those fix-version fields have to be 100% accurate too.

Re: CHANGES.TXT

Posted by "Colin P. McCabe" <cm...@apache.org>.

I don't understand your negative tone.  What point specifically did I
and other people in the conversation miss?

Colin

On Tue, Sep 22, 2015 at 6:21 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
>
> I don’t whether your ability to completely miss my point every time we communicate with each other, regardless of the issue, is intentional or just a special talent.
>
> On Sep 22, 2015, at 8:52 AM, Colin P. McCabe <cm...@apache.org> wrote:
>
>> I think it's extremely unrealistic to expect Hadoop to ever follow a
>> branchless development model.  In fact, the recent trend has been in
>> the opposite direction-- prominent members of the community have
>> pointed out that we don't have enough long-running, well-tested and
>> well-supported branches.  Producing such a branch was the goal of the
>> ongoing 2.6.1 release effort.  Even if we did somehow switch to a
>> branchless development model, we have numerous people backporting
>> patches to their own repositories-- both Hadoop vendors and large
>> organizations that run Hadoop internally and have their own branches.
>>
>> Branchless development especially doesn't make sense for HDFS, since
>> it would force people to do time-consuming and potentially risky
>> layout versions just to get small bugfixes.  Very few cluster
>> operators want to update the version of their data on-disk just to get
>> this month's urgent bugfix.  There are similar issues in other parts
>> of the stack such as YARN.
>>
>> Anyway, as Steve pointed out in his original post, merge conflicts in
>> CHANGES.txt are not the only problem caused by that file.  It's simply
>> very inaccurate and misleading, since it must be manually updated.  In
>> more than 3 years of working with Hadoop, I've never found CHANGES.txt
>> useful for anything.  git log and JIRA tell you everything you need to
>> know.  CHANGES.txt is a burden to update, misleading to operators, and
>> a relic that should have been removed years ago.
>>
>> I really hope this CHANGES.txt thread doesn't peter out like the rest
>> of them.  Please, let's fix this, finally.  Autogenerate this file.
>>
>> best,
>> Colin
>>
>> On Mon, Sep 14, 2015 at 7:10 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
>>>
>>> On Sep 14, 2015, at 5:15 PM, Colin P. McCabe <cm...@apache.org> wrote:
>>>
>>>> Let's stay focused on the title of the thread-- CHANGES.txt-- and
>>>> discuss issues surrounding releasing trunk in a separate thread.
>>>
>>>
>>>        It directly addresses the thread:  if one isn’t cherry picking patches because there aren’t multiple primary branches in development, the changes.txt conflicts effectively go away.
>

Re: CHANGES.TXT

Posted by Allen Wittenauer <aw...@altiscale.com>.

I don’t whether your ability to completely miss my point every time we communicate with each other, regardless of the issue, is intentional or just a special talent.

On Sep 22, 2015, at 8:52 AM, Colin P. McCabe <cm...@apache.org> wrote:

> I think it's extremely unrealistic to expect Hadoop to ever follow a
> branchless development model.  In fact, the recent trend has been in
> the opposite direction-- prominent members of the community have
> pointed out that we don't have enough long-running, well-tested and
> well-supported branches.  Producing such a branch was the goal of the
> ongoing 2.6.1 release effort.  Even if we did somehow switch to a
> branchless development model, we have numerous people backporting
> patches to their own repositories-- both Hadoop vendors and large
> organizations that run Hadoop internally and have their own branches.
> 
> Branchless development especially doesn't make sense for HDFS, since
> it would force people to do time-consuming and potentially risky
> layout versions just to get small bugfixes.  Very few cluster
> operators want to update the version of their data on-disk just to get
> this month's urgent bugfix.  There are similar issues in other parts
> of the stack such as YARN.
> 
> Anyway, as Steve pointed out in his original post, merge conflicts in
> CHANGES.txt are not the only problem caused by that file.  It's simply
> very inaccurate and misleading, since it must be manually updated.  In
> more than 3 years of working with Hadoop, I've never found CHANGES.txt
> useful for anything.  git log and JIRA tell you everything you need to
> know.  CHANGES.txt is a burden to update, misleading to operators, and
> a relic that should have been removed years ago.
> 
> I really hope this CHANGES.txt thread doesn't peter out like the rest
> of them.  Please, let's fix this, finally.  Autogenerate this file.
> 
> best,
> Colin
> 
> On Mon, Sep 14, 2015 at 7:10 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
>> 
>> On Sep 14, 2015, at 5:15 PM, Colin P. McCabe <cm...@apache.org> wrote:
>> 
>>> Let's stay focused on the title of the thread-- CHANGES.txt-- and
>>> discuss issues surrounding releasing trunk in a separate thread.
>> 
>> 
>>        It directly addresses the thread:  if one isn’t cherry picking patches because there aren’t multiple primary branches in development, the changes.txt conflicts effectively go away.

Re: CHANGES.TXT

Posted by "Colin P. McCabe" <cm...@apache.org>.

I think it's extremely unrealistic to expect Hadoop to ever follow a
branchless development model.  In fact, the recent trend has been in
the opposite direction-- prominent members of the community have
pointed out that we don't have enough long-running, well-tested and
well-supported branches.  Producing such a branch was the goal of the
ongoing 2.6.1 release effort.  Even if we did somehow switch to a
branchless development model, we have numerous people backporting
patches to their own repositories-- both Hadoop vendors and large
organizations that run Hadoop internally and have their own branches.

Branchless development especially doesn't make sense for HDFS, since
it would force people to do time-consuming and potentially risky
layout versions just to get small bugfixes.  Very few cluster
operators want to update the version of their data on-disk just to get
this month's urgent bugfix.  There are similar issues in other parts
of the stack such as YARN.

Anyway, as Steve pointed out in his original post, merge conflicts in
CHANGES.txt are not the only problem caused by that file.  It's simply
very inaccurate and misleading, since it must be manually updated.  In
more than 3 years of working with Hadoop, I've never found CHANGES.txt
useful for anything.  git log and JIRA tell you everything you need to
know.  CHANGES.txt is a burden to update, misleading to operators, and
a relic that should have been removed years ago.

I really hope this CHANGES.txt thread doesn't peter out like the rest
of them.  Please, let's fix this, finally.  Autogenerate this file.

best,
Colin

On Mon, Sep 14, 2015 at 7:10 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
>
> On Sep 14, 2015, at 5:15 PM, Colin P. McCabe <cm...@apache.org> wrote:
>
>> Let's stay focused on the title of the thread-- CHANGES.txt-- and
>> discuss issues surrounding releasing trunk in a separate thread.
>
>
>         It directly addresses the thread:  if one isn’t cherry picking patches because there aren’t multiple primary branches in development, the changes.txt conflicts effectively go away.

Re: CHANGES.TXT

Posted by Allen Wittenauer <aw...@altiscale.com>.

On Sep 14, 2015, at 5:15 PM, Colin P. McCabe <cm...@apache.org> wrote:

> Let's stay focused on the title of the thread-- CHANGES.txt-- and
> discuss issues surrounding releasing trunk in a separate thread.


	It directly addresses the thread:  if one isn’t cherry picking patches because there aren’t multiple primary branches in development, the changes.txt conflicts effectively go away.

Re: CHANGES.TXT

Posted by "Colin P. McCabe" <cm...@apache.org>.

Let's stay focused on the title of the thread-- CHANGES.txt-- and
discuss issues surrounding releasing trunk in a separate thread.

Colin

On Mon, Sep 14, 2015 at 3:59 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
>
>         Given that we haven’t had a single minor release in the “stable” era of branch-2 that didn’t break compatibility, we should just declare defeat and make regular releases from trunk as the next 2.x.  Then the whole “back port” thing becomes moot.  We’ve already trained users that they need to double check everything they are currently doing, so might as well make it official and start cutting releases on a scheduled basis.

Re: CHANGES.TXT

Posted by Allen Wittenauer <aw...@altiscale.com>.

	Given that we haven’t had a single minor release in the “stable” era of branch-2 that didn’t break compatibility, we should just declare defeat and make regular releases from trunk as the next 2.x.  Then the whole “back port” thing becomes moot.  We’ve already trained users that they need to double check everything they are currently doing, so might as well make it official and start cutting releases on a scheduled basis.

Re: CHANGES.TXT

Posted by "Colin P. McCabe" <cm...@apache.org>.

On Sat, Sep 12, 2015 at 11:28 AM, Haohui Mai <ri...@gmail.com> wrote:
> CHANGES.txt is always a pain. *sigh*
>
> It seems that relying on human efforts to maintain the CHANGES.txt is
> error-prone and not sustainable. It is always a pain to fix them.
>
> I think aw has some scripts for option 2.
>
> I would like to propose option 3 which might be more robust: (1) do a
> git log on the branch to figure out the jiras that are committed to
> the branch. and (2) generate CHANGES.txt by going through these jiras.
> That might eliminate the fix-version issue.

+1000.

CHANGES.txt is a huge pain when doing backports (even just from trunk
to branch-2) and is much more unreliable than a simple "git log."  We
should just have a script to generate this file during the release,
similar to releasenotes.txt.  In the rare case where git log is wrong
(because of an incorrect commit message or something), we can have a
side file checked into the repo containing corrections that the
CHANGES.txt script can read and use to modify the git log.

best,
Colin

>
> I can volunteer some effort to help on this.
>
> ~Haohui
>
>
> On Sat, Sep 12, 2015 at 11:03 AM, Steve Loughran <st...@hortonworks.com> wrote:
>>
>> I've just been trying to get CHANGES.TXT between branch-2 and trunk more in sync, so that cherry picking patches from branch-2 up to trunk is more reliable.
>>
>> Once you look closely , you see it is a mess, specifically:
>>
>> trunk/CHANGES.TXT declares things as in trunk only yet which are in branch-2 and/or actual releases
>>
>>
>> What to do?
>>
>> 1. audit trunk/CHANGES.TXT against branch-2/CHANGES.TXT; anything in branch-2's (i.e. to come in 2.8) is placed into trunk at that location; the "new in trunk" runk's version removed
>>
>> 2. go to JIRA-generated change logs. Though for that to be reliable, those fix-version fields have to be 100% accurate too.
>>
>>

Re: CHANGES.TXT

Posted by Haohui Mai <ri...@gmail.com>.

CHANGES.txt is always a pain. *sigh*

It seems that relying on human efforts to maintain the CHANGES.txt is
error-prone and not sustainable. It is always a pain to fix them.

I think aw has some scripts for option 2.

I would like to propose option 3 which might be more robust: (1) do a
git log on the branch to figure out the jiras that are committed to
the branch. and (2) generate CHANGES.txt by going through these jiras.
That might eliminate the fix-version issue.

I can volunteer some effort to help on this.

~Haohui

On Sat, Sep 12, 2015 at 11:03 AM, Steve Loughran <st...@hortonworks.com> wrote:
>
> I've just been trying to get CHANGES.TXT between branch-2 and trunk more in sync, so that cherry picking patches from branch-2 up to trunk is more reliable.
>
> Once you look closely , you see it is a mess, specifically:
>
> trunk/CHANGES.TXT declares things as in trunk only yet which are in branch-2 and/or actual releases
>
>
> What to do?
>
> 1. audit trunk/CHANGES.TXT against branch-2/CHANGES.TXT; anything in branch-2's (i.e. to come in 2.8) is placed into trunk at that location; the "new in trunk" runk's version removed
>
> 2. go to JIRA-generated change logs. Though for that to be reliable, those fix-version fields have to be 100% accurate too.
>
>

Re: CHANGES.TXT

Posted by Chris Douglas <cd...@apache.org>.

On Mon, Sep 14, 2015 at 1:09 PM, Andrew Wang <an...@cloudera.com> wrote:
> I put some time into this a little while back, doing releasedocmaker
> backports from the Yetus branch. IIRC from Allen, we still need to get lint
> mode working before it's good to go. Probably a good bit of JIRA cleanup
> will be required to get a clean lint run.

The lint tests will evolve as we discover new inconsistencies. Does it
need to be a prerequisite?

We've had this discussion before. Using JIRA is at least as accurate
as CHANGES.txt, we should just use Allen's script (now part of Yetus?)
to generate the release notes, and stop rehashing this. If someone
wants to cross-reference other sources to improve accuracy, that would
be fine followup. -C

> On Sun, Sep 13, 2015 at 4:38 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>>
>> > On 12 Sep 2015, at 20:26, Allen Wittenauer <aw...@altiscale.com> wrote:
>> >
>> >
>> > On Sep 12, 2015, at 12:25 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
>> >
>> >>
>> >> On Sep 12, 2015, at 11:03 AM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>> >>
>> >>>
>> >>> 2. go to JIRA-generated change logs. Though for that to be reliable,
>> those fix-version fields have to be 100% accurate too.
>> >
>> >
>> > P.S.,
>> https://github.com/aw-altiscale/eco-release-metadata/blob/master/HADOOP/README.md
>> >
>> >
>>
>> nice. I'd prefer this, as long as we keep up on the versioning. In
>> particular, any cherry picking back to a point release, 2.7.2 etc will need
>> the entry being re-edited
>>
>> in the meantime, I celebrate Haohui's volunteering -though need to warn it
>> was the thought of making the 2.7.x and 2.6.x changes consistent which
>> worried me there -you end up having to re-arrange entries across four files
>>
>> -steve
>>

Re: CHANGES.TXT

Posted by Andrew Wang <an...@cloudera.com>.

I put some time into this a little while back, doing releasedocmaker
backports from the Yetus branch. IIRC from Allen, we still need to get lint
mode working before it's good to go. Probably a good bit of JIRA cleanup
will be required to get a clean lint run.

On Sun, Sep 13, 2015 at 4:38 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> > On 12 Sep 2015, at 20:26, Allen Wittenauer <aw...@altiscale.com> wrote:
> >
> >
> > On Sep 12, 2015, at 12:25 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
> >
> >>
> >> On Sep 12, 2015, at 11:03 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
> >>
> >>>
> >>> 2. go to JIRA-generated change logs. Though for that to be reliable,
> those fix-version fields have to be 100% accurate too.
> >
> >
> > P.S.,
> https://github.com/aw-altiscale/eco-release-metadata/blob/master/HADOOP/README.md
> >
> >
>
> nice. I'd prefer this, as long as we keep up on the versioning. In
> particular, any cherry picking back to a point release, 2.7.2 etc will need
> the entry being re-edited
>
> in the meantime, I celebrate Haohui's volunteering -though need to warn it
> was the thought of making the 2.7.x and 2.6.x changes consistent which
> worried me there -you end up having to re-arrange entries across four files
>
> -steve
>

Re: CHANGES.TXT

Posted by Steve Loughran <st...@hortonworks.com>.

> On 12 Sep 2015, at 20:26, Allen Wittenauer <aw...@altiscale.com> wrote:
> 
> 
> On Sep 12, 2015, at 12:25 PM, Allen Wittenauer <aw...@altiscale.com> wrote:
> 
>> 
>> On Sep 12, 2015, at 11:03 AM, Steve Loughran <st...@hortonworks.com> wrote:
>> 
>>> 
>>> 2. go to JIRA-generated change logs. Though for that to be reliable, those fix-version fields have to be 100% accurate too.
> 
> 
> P.S., https://github.com/aw-altiscale/eco-release-metadata/blob/master/HADOOP/README.md
> 
> 

nice. I'd prefer this, as long as we keep up on the versioning. In particular, any cherry picking back to a point release, 2.7.2 etc will need the entry being re-edited

in the meantime, I celebrate Haohui's volunteering -though need to warn it was the thought of making the 2.7.x and 2.6.x changes consistent which worried me there -you end up having to re-arrange entries across four files

-steve

Re: CHANGES.TXT

Posted by Allen Wittenauer <aw...@altiscale.com>.

On Sep 12, 2015, at 12:25 PM, Allen Wittenauer <aw...@altiscale.com> wrote:

> 
> On Sep 12, 2015, at 11:03 AM, Steve Loughran <st...@hortonworks.com> wrote:
> 
>> 
>> 2. go to JIRA-generated change logs. Though for that to be reliable, those fix-version fields have to be 100% accurate too.


P.S., https://github.com/aw-altiscale/eco-release-metadata/blob/master/HADOOP/README.md

Re: CHANGES.TXT

Posted by Allen Wittenauer <aw...@altiscale.com>.

On Sep 12, 2015, at 11:03 AM, Steve Loughran <st...@hortonworks.com> wrote:

> 
> 1. audit trunk/CHANGES.TXT against branch-2/CHANGES.TXT; anything in branch-2's (i.e. to come in 2.8) is placed into trunk at that location; the "new in trunk" runk's version removed 
> 
> 2. go to JIRA-generated change logs. Though for that to be reliable, those fix-version fields have to be 100% accurate too.
> 
> 

BTW, HADOOP-10759 was reverted so it shouldn’t be in any CHANGES.TXT files.  I’ve already taken out the fix version in JIRA.