You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Johan Corveleyn <jc...@gmail.com> on 2017/12/04 11:16:37 UTC

[PATCH] release.py: write-changelog function (POC)

As promised on IRC last Friday, here is a POC implementation of the
"generate changelog from commit messages" functionality, added to
release.py (I'm not very experienced in Python; I mainly depend on
google, SO, copy-paste, ... so don't focus on coding style etc).

This patch is intended to enable further discussion of this idea
(which was already discussed on this list 4 years ago [1]).

The idea is that we agree on some structured syntax for adding
(optionally) a changelog text to commit messages, to make it easier
for RMs to generate the text for CHANGES (and push the responsibility
for writing a good CHANGES entry first and foremost to the original
committer; and keep the relevant info coupled with the commit). RMs
can use the output of write-changelog as a draft, they can still edit
it, summarize items, shuffle things around, ... But it gives them a
rough version to start from.

Proposal for changelog syntax in commit messages (as implemented in this patch):
[[[
Changelog lines are lines with the following format:
  '['<visibility>:<section>']' <message>
where <visibility> = U (User-visible) or D (Developer-visible)
      <section> =
general|major|minor|client|server|client-server|other|api|bindings
                  (section is treated case-insensitively)
      <message> = the actual text for CHANGES

Examples:
  [U:major] Better interactive conflict resolution for tree conflicts
  [U:minor] ra_serf: Adjustments for serf versions with HTTP/2 support
  [U:client] Fix 'svn diff URL@REV WC' wrongly looks up URL@HEAD (issue #4597)
  [U:client-server] Fix bug with canonicalizing Window-specific
drive-relative URL
  [D:api] New svn_ra_list() API function
  [D:bindings] JavaHL: Allow access to constructors of a couple JavaHL classes

Q: Shorter prefix syntax ([U:C], [U:CS], [U:MJ], ...) to keep lines
short, or longer (and put message on next line) to make it more
human-readable? While making it easily rememberable for devs so they
don't have to look it up all the time when they just want to commit
...

Q: What to do with merged revisions? Use 'log -g', or make sure
relevant changelog entry is part of the commit message of the merge? I
vote for the latter, it's simpler and has less surprises. In case of
feature branches, a generic "Add feature foo" message on the
reintegrate commit usually suffices. In case of backports perhaps our
backport script can scrape the relevant changelog entries from the
revisions-to-be-merged and add them to the commit message of the
merge.
]]]


To get a rough idea: since we don't have any commit messages
containing such lines in our history, I've added a --pocfirstlines
option, which just takes the first line of every log message (ignoring
lines with 'STATUS', 'CHANGES', 'bump', or starting with '*'), putting
them in the "User -> General" section.

Here is the usage output:
[[[
$ ./release.py write-changelog -h
usage: release.py write-changelog [-h] [--pocfirstlines] branch previous

positional arguments:
  branch           The branch (or tag or trunk), relative to ^/subversion/, of
                   which to generate the changelog, when compared to
                   "previous".
  previous         The "previous" branch or tag, relative to ^/subversion/, to
                   compare "branch" against.

optional arguments:
  -h, --help       show this help message and exit
  --pocfirstlines  Proof of concept: just take the first line of each relevant
                   commit messages (except if it contains 'STATUS', 'CHANGES'
                   or 'bump' or starts with '*'), and put it in User:General.
]]]


Example output:
[[[
$ ./release.py write-changelog --pocfirstlines branches/1.9.x tags/1.9.7
 User-visible changes:
  - General:
    * Merge r1804691 and r1804692 from trunk: (r1804698)
  - Client-side bugfixes:
    (none)
  - Server-side bugfixes:
    (none)
  - Bindings bugfixes:
    (none)

 Developer-visible changes:
  - General:
    (none)
  - API changes:
    (none)

$ ./release.py write-changelog --pocfirstlines branches/1.8.x tags/1.8.19
 User-visible changes:
  - General:
    * Merge r1804691 from trunk: (r1804723)
    * On the 1.8.x branch: Merge r1804692 from trunk. (r1804737)
  - Client-side bugfixes:
    (none)
  - Server-side bugfixes:
    (none)
  - Bindings bugfixes:
    (none)

 Developer-visible changes:
  - General:
    (none)
  - API changes:
    (none)

$ ./release.py write-changelog --pocfirstlines trunk tags/1.9.7
 User-visible changes:
  - General:
    * A bug fix and minor tweaks in 'svnmover'. (r1715781)
    * A cosmetic tweak: add a final comma to lists of tests in a few
test files (r1706965)
    * A few FSFS-only tests apply to FSX just as well.  So, run them
for (r1667524)
    * A follow-up to r1715354. (r1715358)
    * A minor code cleanup in FSFS. (r1740722)
    * A minor tweak in 'svnmover'. (r1717793)
    * A small step towards making 'svnmover merge' operate into a new
temporary (r1717957)
    * Abbreviate the potentially rather long list of revisions shown
for tree (r1736063)
    * Actually use some helpful error messaging that we bother to
create in (r1683161)
    * Add "merge_" prefix to the names of conflict resolver merge test
sandboxes. (r1749457)
    * Add '--include' and '--exclude' options to 'svnadmin dump'. (r1811992)
    * Add '--search' option support to 'svnbench null-list'. (r1767202)
    * Add 'http-compression=auto' mode on the client, now used by
default. (r1803899)
...
]]]


Thoughts?

[1] https://lists.apache.org/thread.html/c80dd19a7bbafc4f535382b3f361f76ba6535ab3d447a8b988594bfc@1377814810@%3Cdev.subversion.apache.org%3E

-- 
Johan

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Tue, 05 Dec 2017 00:04 +0100:
> On Mon, Dec 4, 2017 at 11:01 PM, Julian Foad <ju...@apache.org> wrote:
> > Johan Corveleyn wrote:
> >> Just thinking out loud here ...
> >
> > [...]> Hmmmm
> >
> > Now you're over-thinking it. What you said first, what you use at work, is
> > fine. Run with it!
> 
> Hehe, maybe :-).
> 
> OTOH: Subversion also has a 15+ year old log message style that has
> served it well. Before giving this system a try (if we agree we
> should),

Let's go ahead and backfill log messages using the new format for trunk since
1.10.0-alpha3 (excepting changes already released in 1.9.x patch releases).
That would (a) get our feet wet with the new format, and make any shortcomings
or redundancies in it obvious (as well as whether their frequency warrants
discussing them and implementing special syntaxes for them), (b) be technical
credit¹ towards writing the 1.10.0-rc1 changelog.

So we're all on the same page, I propose we start by using the [U:foo]/[D:bar]
format from jcorvel's original message, plus the [U]/[D] fallback Julian
proposed.  We can add further extensions and solve edge cases (e.g., rfc822
line unfolding) as we run across them.

Personally, I wouldn't worry about duplication for now; we can always invent
additional syntaxes later that will reduce duplication in log messages.

Cheers,

Daniel

¹ "technical credit" is the opposite of technical debt.

> I think we should think carefully how to fit this into the
> existing style, without breaking it. It's especially important to get
> some buy-in from the people who create the most commits, and that's
> certainly not me :-).
> 
> At work we have no such strong log message style as SVN. We limit
> ourselves to a couple of lines, and every line is *required* to have
> such a "tagged" prefix (which is enforced by a pre-commit hook, which
> on error gives a reminder of the precise syntax). It also looks a
> little different, with square brackets around the different parts:
> 
>    [U][General][SVN-1111] Fix crash in 'svn co'.
> 
> (the issue annotation is optional, the other two are mandatory).
> 
>     [D][API] Add new api svn__blah() as entry point to the blahing feature.

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Dec 4, 2017 at 11:01 PM, Julian Foad <ju...@apache.org> wrote:
> Johan Corveleyn wrote:
>>
>> [...]
>> Hm, yes, I agree with the "don't write the same thing twice". But
>> perhaps the "general description" above the list of affected files
>> would be a better place:
>
> [...]
>>
>> Though, indeed, we're not required to always have a "general
>> description", and can just start with the affected files, if the
>> change is simple. So ... not sure.
>>
>> That's the thing I'm most uncertain of at the moment: how to fit this
>> scheme precisely into our current log message style, without
>> interfering too much, keeping them as readable as possible for human
>> readers.
>>
>> Maybe a syntax with '@' would be better, like annotations in Java or
>> doxygen. Like:
>
> [...]
>>
>> or as a suffix:
>
> [...]
>>
>> Just thinking out loud here ...
>
> [...]> Hmmmm
>
> Now you're over-thinking it. What you said first, what you use at work, is
> fine. Run with it!

Hehe, maybe :-).

OTOH: Subversion also has a 15+ year old log message style that has
served it well. Before giving this system a try (if we agree we
should), I think we should think carefully how to fit this into the
existing style, without breaking it. It's especially important to get
some buy-in from the people who create the most commits, and that's
certainly not me :-).

At work we have no such strong log message style as SVN. We limit
ourselves to a couple of lines, and every line is *required* to have
such a "tagged" prefix (which is enforced by a pre-commit hook, which
on error gives a reminder of the precise syntax). It also looks a
little different, with square brackets around the different parts:

   [U][General][SVN-1111] Fix crash in 'svn co'.

(the issue annotation is optional, the other two are mandatory).

    [D][API] Add new api svn__blah() as entry point to the blahing feature.

-- 
Johan

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, Dec 5, 2017 at 1:00 PM, Branko Čibej <br...@apache.org> wrote:
> On 04.12.2017 23:01, Julian Foad wrote:
>> Johan Corveleyn wrote:
>>> [...]
>>> Hm, yes, I agree with the "don't write the same thing twice". But
>>> perhaps the "general description" above the list of affected files
>>> would be a better place:
>> [...]
>>> Though, indeed, we're not required to always have a "general
>>> description", and can just start with the affected files, if the
>>> change is simple. So ... not sure.
>>>
>>> That's the thing I'm most uncertain of at the moment: how to fit this
>>> scheme precisely into our current log message style, without
>>> interfering too much, keeping them as readable as possible for human
>>> readers.
>>>
>>> Maybe a syntax with '@' would be better, like annotations in Java or
>>> doxygen. Like:
>> [...]
>>> or as a suffix:
>> [...]
>>> Just thinking out loud here ...
>> [...]> Hmmmm
>>
>> Now you're over-thinking it. What you said first, what you use at
>> work, is fine. Run with it!
>
>
> IMO the important thing is to make the tagging in the log messages as
> human-friendly as possible. That's what made the contribulyzer work so
> well: there were no $[@foo;\\} syntactical quirks to remember and the
> syntax is permissive.
>
> Therefore I think stsp's suggestion of just taking the summary line in
> the log message as the CHANGES entry is more likely to produce useful
> results than any magic tagging. I don't think it's all that important to
> classify the change into user-facing, etc., at commit time. All we need,
> IMO, is a way to signal whether the change is or is not important.
>
> So I'd suggest we use stsp's proposal with a small difference: by
> default, every log message that has a summary line is important. If it's
> not, start the first line with a # to tell the script to ignore it. That
> way, it takes extra thought and effort to exclude a commit from the
> CHANGES summary, which is IMO preferable to having to put in extra
> thought and effort to have it included. It's easier to prune unnecessary
> entries from the summary than it is to dig into commit history to find
> the missing ones.

I'm not sure. I think we're very much accustomed to a certain style
for CHANGES, and another style and information-density for the summary
of commit messages. If these could converge, then sure, this could be
a good idea. If not, I think we haven't gained much (the RM would have
to rephrase them all anyway).

For example, CHANGES for 1.10 contains (just taking a couple of
entries under User-visible "Minor new features and improvements"):

    * New '--file' option for several svnadmin subcommands (r1738021)
    * ra_serf: Adjustments for serf versions with HTTP/2 support (r1716400)
    * windows: Correctly check result from LoadLibrary() call (r1755983)
    * FSFS: Open transaction's proto revision in write-only mode (r1759135)

And the corresponding lines out of "./release.py write-changelog
--pocfirstlines trunk tags/1.9.7" are (just grepping on the revision
numbers):

    * Introduce `--file' option for svnadmin dump, dump-revprops, load
and (r1738021)
    * Apply some minor tweaks to libsvn_ra_serf to handle some http/2
cases. (r1716400)
    * Correctly check result from LoadLibrary() call: LoadLibrary()
returns NULL (r1755983)
    * FSFS: Open transaction's proto revision in write-only mode.
Read/write mode (r1759135)

That last one is actually the same, nice :-).

I think these could converge, i.e. we *can* get used to write our
summaries in the CHANGES style (for instance: put the relevant
"component" in front, like "ra_serf:", "windows:", ...; and make the
first line of the summary shorter). But it'll take getting used to
:-).

Another thing is that there are a lot more commits than entries in CHANGES.
* CHANGES for 1.10 is currently 244 lines.
* output of "./release.py write-changelog --pocfirstlines trunk
tags/1.9.7" is 1633 lines.

That means we'd have to use the "#" (or other symbol) escape a lot.
That, or a lot of those commits are reverts (we'd need to handle these
well in any case), or are summarized with "(... et al)".

The flip side would be, if we could get the hang of this, we'd need no
special syntax, no duplication of "almost the same text", and our
"commit message summaries" are automatically fit for CHANGES (and it's
safer to forget to exclude a commit). Which also means an option for
"svn log" showing only the first line would be interesting, and would
approximately yield CHANGES plus the "#-escaped" summaries.

Hmmm, red pill or blue pill.

-- 
Johan

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Branko Čibej <br...@apache.org>.
On 04.12.2017 23:01, Julian Foad wrote:
> Johan Corveleyn wrote:
>> [...]
>> Hm, yes, I agree with the "don't write the same thing twice". But
>> perhaps the "general description" above the list of affected files
>> would be a better place:
> [...]
>> Though, indeed, we're not required to always have a "general
>> description", and can just start with the affected files, if the
>> change is simple. So ... not sure.
>>
>> That's the thing I'm most uncertain of at the moment: how to fit this
>> scheme precisely into our current log message style, without
>> interfering too much, keeping them as readable as possible for human
>> readers.
>>
>> Maybe a syntax with '@' would be better, like annotations in Java or
>> doxygen. Like:
> [...]
>> or as a suffix:
> [...]
>> Just thinking out loud here ...
> [...]> Hmmmm
>
> Now you're over-thinking it. What you said first, what you use at
> work, is fine. Run with it!


IMO the important thing is to make the tagging in the log messages as
human-friendly as possible. That's what made the contribulyzer work so
well: there were no $[@foo;\\} syntactical quirks to remember and the
syntax is permissive.

Therefore I think stsp's suggestion of just taking the summary line in
the log message as the CHANGES entry is more likely to produce useful
results than any magic tagging. I don't think it's all that important to
classify the change into user-facing, etc., at commit time. All we need,
IMO, is a way to signal whether the change is or is not important.

So I'd suggest we use stsp's proposal with a small difference: by
default, every log message that has a summary line is important. If it's
not, start the first line with a # to tell the script to ignore it. That
way, it takes extra thought and effort to exclude a commit from the
CHANGES summary, which is IMO preferable to having to put in extra
thought and effort to have it included. It's easier to prune unnecessary
entries from the summary than it is to dig into commit history to find
the missing ones.

-- Brane

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Julian Foad <ju...@apache.org>.
Johan Corveleyn wrote:
> [...]
> Hm, yes, I agree with the "don't write the same thing twice". But
> perhaps the "general description" above the list of affected files
> would be a better place:
[...]
> Though, indeed, we're not required to always have a "general
> description", and can just start with the affected files, if the
> change is simple. So ... not sure.
> 
> That's the thing I'm most uncertain of at the moment: how to fit this
> scheme precisely into our current log message style, without
> interfering too much, keeping them as readable as possible for human
> readers.
> 
> Maybe a syntax with '@' would be better, like annotations in Java or
> doxygen. Like:
[...]
> or as a suffix:
[...]
> Just thinking out loud here ...
[...]> Hmmmm

Now you're over-thinking it. What you said first, what you use at work, 
is fine. Run with it!

Thanks,
- Julian


Re: [PATCH] release.py: write-changelog function (POC)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Dec 4, 2017 at 3:15 PM, Julian Foad <ju...@apache.org> wrote:
> Stefan Sperling wrote:
>>
>> Johan Corveleyn wrote:
>>>
>>> The intention is that the RM doesn't need to go through all the full
>>> log messages in detail, but that he can trust the output of
>>> write-changelog, combined with trust in the quality of the log
>>> messages involved.
>>
>>
>> OK, in that light it all makes sense and I agree this is worth a try.
>> And if we don't manage to pull it off well enough, nothing is lost
>> compared to the status quo.
>
>
> Johan, you told us at the hackathon (and mentioned in the older linked
> thread) that your team has been using a system like this for some time at
> your workplace. For me, that gives the suggestion a lot more credibility.

Yes, absolutely. We're using a similar method for over 15 years now,
with weekly releases of a big software system (healthcare software,
being used in around 20 Belgian hospitals). It's more strict /
structured than what I've proposed here, but that's normal I guess ...
corporate world vs. open source. And it's not perfect of course, but
it has served us well and saves us a lot of time. I'm trying to find a
pragmatic way to apply the same core idea and get the benefits of "all
committers writing their own changelog entries at the time they're
writing the code".

...
> In the case where a user-facing but very simple change is made, I think we
> would only need to write the tagged user-facing line in the log message --
> we should never need to write two lines that say the same thing. Example log
> message:
>
> [[[
> * subversion/svn/svn.c:
>   [U:client] Add a 'reintegrate merge' section to the 'svn help merge'.
> ]]]

Hm, yes, I agree with the "don't write the same thing twice". But
perhaps the "general description" above the list of affected files
would be a better place:

[[[
[U:client] Add a 'reintegrate merge' section to the 'svn help merge'.

* subversion/svn/svn.c:
  (blah): Add help text.
]]]

Though, indeed, we're not required to always have a "general
description", and can just start with the affected files, if the
change is simple. So ... not sure.

That's the thing I'm most uncertain of at the moment: how to fit this
scheme precisely into our current log message style, without
interfering too much, keeping them as readable as possible for human
readers.

Maybe a syntax with '@' would be better, like annotations in Java or
doxygen. Like:
[[[
@U:client Add a 'reintegrate merge' section to the 'svn help merge'.

* subversion/svn/svn.c:
  (blah): Add help text.
]]]

or
[[[
* subversion/svn/svn.c:
  @U:client Add a 'reintegrate merge' section to the 'svn help merge'.
]]]

or as a suffix:
[[[
* subversion/svn/svn.c:
  Add a 'reintegrate merge' section to the 'svn help merge'. [U:client]
]]]

Just thinking out loud here ...
Or we put it at the bottom as an "extra", like the contribulyzer
lines, but then we have the problem of potentially having to write the
same line twice:
[[[
* subversion/svn/svn.c:
  (blah): Add a 'reintegrate merge' section to the 'svn help merge'.

[U:client] Add a 'reintegrate merge' section to the 'svn help merge'
]]]

Hmmmm

> I think one of the keys to making this successful is to keep the burden low.
> For example, if "[U:section]" is the general tag for a user-visible change
> then just "[U]" should be allowed so the dev can go ahead with the commit
> when a suitable section is not defined or not obvious.

Yes, good idea. Anything with [U] would become a simple '*' bullet
below "User-visible changes", and the RM can decide where it goes.

-- 
Johan

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Julian Foad <ju...@apache.org>.
Stefan Sperling wrote:
> Johan Corveleyn wrote:
>> The intention is that the RM doesn't need to go through all the full
>> log messages in detail, but that he can trust the output of
>> write-changelog, combined with trust in the quality of the log
>> messages involved.
> 
> OK, in that light it all makes sense and I agree this is worth a try.
> And if we don't manage to pull it off well enough, nothing is lost
> compared to the status quo.

Johan, you told us at the hackathon (and mentioned in the older linked 
thread) that your team has been using a system like this for some time 
at your workplace. For me, that gives the suggestion a lot more credibility.

The 'contribulyzer' syntax we use in log messages is the same sort of 
scheme. I am slightly surprised it was as successful and long-lasting as 
it is. When it was introduced I think a lot of gentle reminders were 
issued until most people got accustomed to writing it.

I like the idea of something that encourages us to put more user-facing 
descriptions in log messages, for changes that have a user-facing 
impact. Sometimes I check the change messages for system security 
updates I am installing, and I notice the huge difference between those 
that say "change sys.foo.bar back to 18 to fix #4321" and those that say 
"disable auto-switching the wifi channel because it wasn't reliable 
(#4321)".

In the case where a user-facing but very simple change is made, I think 
we would only need to write the tagged user-facing line in the log 
message -- we should never need to write two lines that say the same 
thing. Example log message:

[[[
* subversion/svn/svn.c:
   [U:client] Add a 'reintegrate merge' section to the 'svn help merge'.
]]]

I think one of the keys to making this successful is to keep the burden 
low. For example, if "[U:section]" is the general tag for a user-visible 
change then just "[U]" should be allowed so the dev can go ahead with 
the commit when a suitable section is not defined or not obvious.

- Julian

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Stefan Sperling <st...@apache.org>.
On Mon, Dec 04, 2017 at 02:17:59PM +0100, Johan Corveleyn wrote:
> > I believe that whatever we do, somebody will still have to read the full log
> > and check each entry in CHANGES to avoid listing a lot of trivial stuff,
> > and to make sure the most impactful changes appear at the top.
> 
> I don't think so. Not if the changelog annotations are used well (as
> with writing log message in general, this probably requires practice
> to get right -- it'd be a part of reviewing each other's changes to
> potentially give feedback on the (optional) changelog annotation).
> 
> The intention is that the RM doesn't need to go through all the full
> log messages in detail, but that he can trust the output of
> write-changelog, combined with trust in the quality of the log
> messages involved.

OK, in that light it all makes sense and I agree this is worth a try.
And if we don't manage to pull it off well enough, nothing is lost
compared to the status quo.

Thanks for driving this forward!

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Dec 4, 2017 at 1:45 PM, Stefan Sperling <st...@apache.org> wrote:
> On Mon, Dec 04, 2017 at 12:16:37PM +0100, Johan Corveleyn wrote:
>> Examples:
>>   [U:major] Better interactive conflict resolution for tree conflicts
>>   [U:minor] ra_serf: Adjustments for serf versions with HTTP/2 support
>>   [U:client] Fix 'svn diff URL@REV WC' wrongly looks up URL@HEAD (issue #4597)
>>   [U:client-server] Fix bug with canonicalizing Window-specific
>> drive-relative URL
>>   [D:api] New svn_ra_list() API function
>>   [D:bindings] JavaHL: Allow access to constructors of a couple JavaHL classes
>
>> To get a rough idea: since we don't have any commit messages
>> containing such lines in our history, I've added a --pocfirstlines
>> option, which just takes the first line of every log message (ignoring
>> lines with 'STATUS', 'CHANGES', 'bump', or starting with '*'), putting
>> them in the "User -> General" section.
>
> If we're gonna add some form of structured information to our log messages,
> I'd prefer the git model with a simple heuristic which builds an uncategorized
> list:
>
> The first line of the message is what will end up in CHANGES. If the line
> mentions an issue number, that will appear in CHANGES as (issue #N), else
> the revision number will appear in changes as (rN). If the line matches
> '[Rr]evert rN' then a prior entry for rN is removed again. If the log message
> matches '[Ff]ollow(ing)-up to' the entry is dropped. If the log message
> contains '[Nn]o functional change' anywhere, the entry is dropped.

That implies that every commit has something useful to add to CHANGES
(except reverts, followups and non-functional changes). That's not the
case. In my proposal the changelog annotation is entirely optional.
Committers need to think whether their commit merits a changelog entry
(and the phrasing of the changelog entry can be different from the
first line of the commit message, which is more aimed at fellow
developers trying to understand a particular commit / how / why /
...).

> That should suffice for the most tedious part of the task: Getting a list
> of eligible entries. Categorizing the resulting list can be done by hand,
> and remaining unimportant changes must be removed by hand.
>
> I would refrain from trying to over-automate this task. It's not a lot of work
> compared to the time we spend writing code. I think I have added most of the
> 1.10 entries in CHANGES myself just by reading through the log, and even that
> wasn't too much effort. It took about a day and a half, and it also forced me
> to read through the entries in detail which helped understand the impact of
> each change. This was a much bigger problem from the project's origins through
> to 1.8, when we had a lot of active developers. For 1.9/1.10 we had a lot less
> activity overall and we can expect it to decrease further in the future.
>
> Even if we tag log messages as you suggest, we'll need some mechanism to
> deal with tags which are wrong or missing, and that's also tedious and
> requires checking each entry anyway. Also, it is unclear how tagging as
> suggested helps with features developed on trunk over long periods of time.
> For example, the conflict resolver probably got a couple hundred commits
> but it only requires one CHANGES entry. Should these all be tagged 'U:major'?

No, none of those individual commits requires a changelog entry. I'd
say for big new features it makes little sense to add tags for
incremental steps in functionality. Though entries such as "Add
interactive tree conflict resolver", "TC resolver: handle incoming
move vs. local edit", etc... (I'm just making these up) might be
useful.

> Or do we now enforce branch-based development so that only the final merge
> commit is tagged, with the downsides that implies, such as no testing during
> development by anyone except the original developer?

No, I'm not suggesting that.

> Or do we add feature tags
> which our script can recognize, such as [U:conflict-resolver]? Could we now
> end up needing multiple tags for some messages? Will it become too complicated?

No specific feature tags I'd say (but "sections", for the existing
sections in CHANGES). Multiple changelog entries in one commit message
would be allowed, but it'd be rather exceptional (for instance, if one
commit adds something User-visible and Developer-visible at the same
time, or a new feature and a corresponding "API change" -- i.e. you
want a changelog entry in both sections).

> I believe that whatever we do, somebody will still have to read the full log
> and check each entry in CHANGES to avoid listing a lot of trivial stuff,
> and to make sure the most impactful changes appear at the top.

I don't think so. Not if the changelog annotations are used well (as
with writing log message in general, this probably requires practice
to get right -- it'd be a part of reviewing each other's changes to
potentially give feedback on the (optional) changelog annotation).

The intention is that the RM doesn't need to go through all the full
log messages in detail, but that he can trust the output of
write-changelog, combined with trust in the quality of the log
messages involved. Basically CHANGES is built up incrementally as we
go in the commit messages themselves (and reviewed as part of our
normal workflow while the work is being done and everything is fresh
in everyone's memory).

Of course, the RM might see in the output of write-changelog that some
entries can be combined or summarized further, rephrased or reordered.
That's fine (in some cases he might want to correct the changelog
annotation in the underlying log message). But he shouldn't have to go
read all the details, unless something's wrong.

-- 
Johan

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Stefan Sperling <st...@apache.org>.
On Mon, Dec 04, 2017 at 12:16:37PM +0100, Johan Corveleyn wrote:
> Examples:
>   [U:major] Better interactive conflict resolution for tree conflicts
>   [U:minor] ra_serf: Adjustments for serf versions with HTTP/2 support
>   [U:client] Fix 'svn diff URL@REV WC' wrongly looks up URL@HEAD (issue #4597)
>   [U:client-server] Fix bug with canonicalizing Window-specific
> drive-relative URL
>   [D:api] New svn_ra_list() API function
>   [D:bindings] JavaHL: Allow access to constructors of a couple JavaHL classes

> To get a rough idea: since we don't have any commit messages
> containing such lines in our history, I've added a --pocfirstlines
> option, which just takes the first line of every log message (ignoring
> lines with 'STATUS', 'CHANGES', 'bump', or starting with '*'), putting
> them in the "User -> General" section.

If we're gonna add some form of structured information to our log messages,
I'd prefer the git model with a simple heuristic which builds an uncategorized
list:

The first line of the message is what will end up in CHANGES. If the line
mentions an issue number, that will appear in CHANGES as (issue #N), else
the revision number will appear in changes as (rN). If the line matches
'[Rr]evert rN' then a prior entry for rN is removed again. If the log message
matches '[Ff]ollow(ing)-up to' the entry is dropped. If the log message
contains '[Nn]o functional change' anywhere, the entry is dropped.

That should suffice for the most tedious part of the task: Getting a list
of eligible entries. Categorizing the resulting list can be done by hand,
and remaining unimportant changes must be removed by hand.

I would refrain from trying to over-automate this task. It's not a lot of work
compared to the time we spend writing code. I think I have added most of the
1.10 entries in CHANGES myself just by reading through the log, and even that
wasn't too much effort. It took about a day and a half, and it also forced me
to read through the entries in detail which helped understand the impact of
each change. This was a much bigger problem from the project's origins through
to 1.8, when we had a lot of active developers. For 1.9/1.10 we had a lot less
activity overall and we can expect it to decrease further in the future.

Even if we tag log messages as you suggest, we'll need some mechanism to
deal with tags which are wrong or missing, and that's also tedious and
requires checking each entry anyway. Also, it is unclear how tagging as
suggested helps with features developed on trunk over long periods of time.
For example, the conflict resolver probably got a couple hundred commits
but it only requires one CHANGES entry. Should these all be tagged 'U:major'?
Or do we now enforce branch-based development so that only the final merge
commit is tagged, with the downsides that implies, such as no testing during
development by anyone except the original developer? Or do we add feature tags
which our script can recognize, such as [U:conflict-resolver]? Could we now
end up needing multiple tags for some messages? Will it become too complicated?

I believe that whatever we do, somebody will still have to read the full log
and check each entry in CHANGES to avoid listing a lot of trivial stuff,
and to make sure the most impactful changes appear at the top.

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Johan Corveleyn <jc...@gmail.com>.
Op 22 dec. 2017 00:53 schreef "Johan Corveleyn":

Also, feel free to improve the code ... I'm far from a python expert
(depending heavily on copy-paste and StackOverflow).


To clarify about the provenance of the code: I didn't actually copy-paste
anything. I was merely joking to illustrate my skill level in python.

-- 
Johan

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Johan Corveleyn <jc...@gmail.com>.
After some further thought and tweaking, I committed a write-changelog
function in r1818988.

I tried to make it flexible enough so we can experiment a bit with it.

- The "changes labels" can be used as prefix or suffix.
- They can be [U:client], [D:api], ... or [U], [D], or simply []
- There's an option --include-unlabeled-summaries to include summary
lines without a label, unless [skip], [ignore], [c:skip] or [c:ignore]
is found somewhere in the commit message. Summary lines containing
'STATUS', 'CHANGES', 'Post-release housekeeping' or 'Follow-up' are
ignored.

I guess further tweaks will come up if / when we start to use this.
Also, feel free to improve the code ... I'm far from a python expert
(depending heavily on copy-paste and StackOverflow).

We might need to do something special with the words 'Revert' and
'Merge', but I'm not yet sure what.

-- 
Johan

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Thu, Dec 14, 2017 at 6:58 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Johan Corveleyn wrote on Mon, Dec 04, 2017 at 12:16:37 +0100:
>> As promised on IRC last Friday, here is a POC implementation of the
>> "generate changelog from commit messages" functionality,
>
> We had some more discussion about this on IRC last Monday¹; I'll try to
> summarize.

Thanks, Daniel. Sorry I didn't get around to dev@-summarizing ...
Some comments below.

...
> Personally, my answers are^W^W proposal is:
>
> - Use a whitelist approach.  To ensure everything noteworthy gets mentioned, we
>   look for missing CHANGES taglines in log messages as part of our ordinary
>   post-commit review of commits@ traffic.  The rationale here is not to encumber
>   trivial commits with the need to explicitly opt-out.

Okay, I like the whitelist approach most too. The fact that for 1.10
we only have 244 CHANGES out of 1633 log-message-summaries (less than
1 out of 6) indicates that most commits won't have anything to say
"externally". So I follow your point on "not encumbering trivial
commits". OTOH, I can also live with the blacklist approach as a way
to make sure the RM certainly doesn't miss things, if others would
insist.

> - We invent some convention (possibly, but not necessarily, a parseable one) to
>   explicitly indicate "This commit shouldn't be added to CHANGES", in order to
>   avoid people wondering whether the omission of a CHANGES tagline was
>   deliberate or an oversight.  (I believe this aligns with Stefan Hett's
>   $WORK's experience with this convention.)
>
>   Thus, every significant commit should have either a CHANGES tagline or
>   CHANGES opt-out tag; and if a significant commit has neither, then we ask
>   about that in a post-commit review and, if needed, propedit the log message
>   to add a CHANGES tagline.

I'd like to add (a heuristic to determine "significance" if you will):
log messages without a summary, usually parseable by seeing a "*" as
the first character, are implicitly "not interesting for CHANGES". No
need to exclude them explicitly. We could also add a couple of other
conventions (I did this in my POC patch, with the --pocfirstlines
option): having STATUS or CHANGES in the summary line usually
indicates: not interesting for adding to CHANGES.

> - The CHANGES tagline can be either a separate line in the log message, or the
>   first line of the log message.  The rationale here is that the target
>   audiences are different, so sometimes the descriptions should be different,
>   but when the descriptions happen to be identical, avoid duplicated effort by
>   the log message author.  For example:
>
>   [[[
>   svnsync: Make everything better.  U[client]
>
>   * subversion/svnsync/svnsync.c
>   ⋮
>   ]]]
>
>   [[[
>   ra_svn: Fix a 30 years old bug in the protocol version number comparison
>   logic.
>
>   U[client and server]: ra_svn: Restore interoperability with 1.42.x, broken by 1.42.3.
>
>   * subversion/libsvn_ra_svn/greeting.c
>   ⋮
>   ]]]

+1, I like having both options (CHANGES-ify the summary, or write
CHANGES entry on a separate line).

> - For the syntax, U[client] for "User-visible changes"/"Client-side bugfixes"
>   and similar syntaxes for the other CHANGES groups.  The parser will look for
>   U[…] and D[…] where the ellipses stand for zero or more words.  We'll
>   probably come up with a standardish set of abbreviations here, reflecting the
>   categorisation of items in CHANGES.  The syntax for "explicitly not for
>   CHANGES" can then be the words "Not for CHANGES." as a standard formula (like
>   "No functional change." is a standard formula) or some parseable thing (e.g.,
>   "U[no]").
>
>   Whenever the CHANGES tagline is not the first line of the log message, we
>   might find it useful to give it some constant prefix, e.g.,
>   .
>       [[[
>       ra_svn: Fix a 30 years old bug in the protocol version number comparison
>       logic.
>
>       c: U[client and server]: ra_svn: Restore interoperability with 1.42.x, broken by 1.42.3.
>       ]]]
>   .
>   for easier / unambiguous parsing.

I must say I still prefer [U:client], [U], [no] or [ignore] over
U[client] etc. The first reason is that I find it more easily
parseable / recognizable as a human reader, but I admit this is very
subjective. Another reason is that I think it eliminates the need to
add that constant prefix like "c: " to make it stand out on a separate
line. I think having a line start with '[' immediately makes it
special enough (both for human readers as for parsers).

      [[[
      ra_svn: Fix a 30 years old bug in the protocol version number comparison
      logic.

      [U:clientserver]: ra_svn: Restore interoperability with 1.42.x,
broken by 1.42.3.
      ]]]


Finally: something else I mentioned on IRC a couple of days ago: if we
assume a maximum line length of 80 characters in CHANGES, we have only
57 or 63 characters available for a "CHANGES entry", because of the '
  * ' in front, and the ' (r1234567)' or ' (r1234567 et al)' at the
end. Don't know if we need to do something about that, or just "let
CHANGES go over 80 characters if your summary is 70-character-ish ...
whatever".

-- 
Johan

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Julian Foad <ju...@apache.org>.
Daniel Shahaf wrote:
[...]
>    explicitly indicate "This commit shouldn't be added to CHANGES",
[...]
> WDYT?

Rather than "This should (or should not) be listed in our specific 
'CHANGES' file", the semantics of an annotated log message summary 
should be "This change is (or is not) externally significant."

Subtle but important.

- Julian

Re: [PATCH] release.py: write-changelog function (POC)

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Mon, Dec 04, 2017 at 12:16:37 +0100:
> As promised on IRC last Friday, here is a POC implementation of the
> "generate changelog from commit messages" functionality,

We had some more discussion about this on IRC last Monday¹; I'll try to
summarize.

Goal: reduce the workload of generating CHANGES before a (primarily minor)
release, via tags in log messages.

Open question #1: Whether this should be a whitelist or a blacklist (i.e.,
whether the script takes all log messages except those that opt out, or vice
versa).

Open question #2: With a whitelist approach, should we have a way to indicate
"This commit does not need to be added to CHANGES" explicitly (as opposed to
implicitly through the lack of any CHANGES magic syntax).

Open question #3: Should we generate the CHANGES line from the first line of
the commit message, or from a separate line (analogous to "Patch by:" lines).

Open question #4: What should the syntax be exactly.

Personally, my answers are^W^W proposal is:

- Use a whitelist approach.  To ensure everything noteworthy gets mentioned, we
  look for missing CHANGES taglines in log messages as part of our ordinary
  post-commit review of commits@ traffic.  The rationale here is not to encumber
  trivial commits with the need to explicitly opt-out.

- We invent some convention (possibly, but not necessarily, a parseable one) to
  explicitly indicate "This commit shouldn't be added to CHANGES", in order to
  avoid people wondering whether the omission of a CHANGES tagline was
  deliberate or an oversight.  (I believe this aligns with Stefan Hett's
  $WORK's experience with this convention.)

  Thus, every significant commit should have either a CHANGES tagline or
  CHANGES opt-out tag; and if a significant commit has neither, then we ask
  about that in a post-commit review and, if needed, propedit the log message
  to add a CHANGES tagline.

- The CHANGES tagline can be either a separate line in the log message, or the
  first line of the log message.  The rationale here is that the target
  audiences are different, so sometimes the descriptions should be different,
  but when the descriptions happen to be identical, avoid duplicated effort by
  the log message author.  For example:

  [[[
  svnsync: Make everything better.  U[client]

  * subversion/svnsync/svnsync.c
  ⋮
  ]]]

  [[[
  ra_svn: Fix a 30 years old bug in the protocol version number comparison
  logic.

  U[client and server]: ra_svn: Restore interoperability with 1.42.x, broken by 1.42.3.

  * subversion/libsvn_ra_svn/greeting.c
  ⋮
  ]]]

- For the syntax, U[client] for "User-visible changes"/"Client-side bugfixes"
  and similar syntaxes for the other CHANGES groups.  The parser will look for
  U[…] and D[…] where the ellipses stand for zero or more words.  We'll
  probably come up with a standardish set of abbreviations here, reflecting the
  categorisation of items in CHANGES.  The syntax for "explicitly not for
  CHANGES" can then be the words "Not for CHANGES." as a standard formula (like
  "No functional change." is a standard formula) or some parseable thing (e.g.,
  "U[no]").

  Whenever the CHANGES tagline is not the first line of the log message, we
  might find it useful to give it some constant prefix, e.g.,
  .
      [[[
      ra_svn: Fix a 30 years old bug in the protocol version number comparison
      logic.

      c: U[client and server]: ra_svn: Restore interoperability with 1.42.x, broken by 1.42.3.
      ]]]
  .
  for easier / unambiguous parsing.

WDYT?

Cheers,

Daniel

P.S. I'm going to include CHANGES information in my future commits, because
that will make our job writing CHANGES easier.  After all, CHANGES text is
better written by the author of the commit than by the RM, isn't it?

¹ http://colabti.org/irclogger/irclogger_log/svn-dev?date=2017-12-11#l157