You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Andy Grove <an...@gmail.com> on 2023/03/14 15:36:39 UTC

[Rust][MaybeNotJustRust] PR titles and generating change logs

We have been using github-changelog-generator [1] to generate changelogs
for the Rust projects for some time now. It has served us well but is no
longer workable, at least for DataFusion. This tool seems to pull down the
entire project history using the GitHub API and we had to artificially slow
it down to avoid hitting API rate limits, and it is now unusable due to the
number of issues and PRs in this repo.

This weekend, I built a much simpler changelog generator in Python [2],
that I am now using for the projects that I am the release manager for
(datafusion, datafusion-python, ballista). It has almost the same
functionality that we were getting from the previous generator, but takes
less than a minute to run, compared to 30+ minutes for the old generator.
It only hits the GitHub API for information about commits and pull requests
in the release being documented, rather than accessing the entire project
history.

I followed the same approach of using GitHub labels to categorize PRs
(enhancements, bug fixes, docs, etc) but this requires a small amount of
manual effort to add those labels and re-generate the changelog.

I noticed that some contributors are already prefixing PR titles with
"feat:", "feature:", "fix:", "docs:", etc. I plan on updating the changelog
generator to recognize these prefixes as well, to help automate my job.

I wonder if it is worth formalizing these "semantic titles" more, and maybe
having CI enforce them. It would improve the quality of our changelogs and
reduce the burden on release managers.

I would appreciate any feedback on this idea.

Thanks,

Andy.


[1] https://github.com/github-changelog-generator/github-changelog-generator
[2] https://github.com/andygrove/changelog-genie

Re: [Rust][MaybeNotJustRust] PR titles and generating change logs

Posted by Will Jones <wi...@gmail.com>.
Thanks for sharing this script!

> I noticed that some contributors are already prefixing PR titles with
> "feat:", "feature:", "fix:", "docs:", etc. I plan on updating the
changelog
> generator to recognize these prefixes as well, to help automate my job.

I believe most people are doing this out of inspiration from the
Conventional Commits standard [1] (at least I am). This standard is used
and enforced in CI in the Substrait main repository, for example. [2]

I've found them not too bad to work with, but I definitely am rebasing and
squashing commits more often to make my messages conform. This can make it
harder for people to see the incremental changes in a PR when re-reviewing.
Other than that, I see no downside.

[1] https://www.conventionalcommits.org/en/v1.0.0/
[2]
https://github.com/substrait-io/substrait/actions/runs/4408812075/jobs/7724306882

On Tue, Mar 14, 2023 at 8:38 AM Andy Grove <an...@gmail.com> wrote:

> I filed an issue in the datafusion repo as well, since not everyone is on
> the mailing list.
>
> https://github.com/apache/arrow-datafusion/issues/5601
>
> On Tue, Mar 14, 2023 at 9:36 AM Andy Grove <an...@gmail.com> wrote:
>
> > We have been using github-changelog-generator [1] to generate changelogs
> > for the Rust projects for some time now. It has served us well but is no
> > longer workable, at least for DataFusion. This tool seems to pull down
> the
> > entire project history using the GitHub API and we had to artificially
> slow
> > it down to avoid hitting API rate limits, and it is now unusable due to
> the
> > number of issues and PRs in this repo.
> >
> > This weekend, I built a much simpler changelog generator in Python [2],
> > that I am now using for the projects that I am the release manager for
> > (datafusion, datafusion-python, ballista). It has almost the same
> > functionality that we were getting from the previous generator, but takes
> > less than a minute to run, compared to 30+ minutes for the old generator.
> > It only hits the GitHub API for information about commits and pull
> requests
> > in the release being documented, rather than accessing the entire project
> > history.
> >
> > I followed the same approach of using GitHub labels to categorize PRs
> > (enhancements, bug fixes, docs, etc) but this requires a small amount of
> > manual effort to add those labels and re-generate the changelog.
> >
> > I noticed that some contributors are already prefixing PR titles with
> > "feat:", "feature:", "fix:", "docs:", etc. I plan on updating the
> changelog
> > generator to recognize these prefixes as well, to help automate my job.
> >
> > I wonder if it is worth formalizing these "semantic titles" more, and
> > maybe having CI enforce them. It would improve the quality of our
> > changelogs and reduce the burden on release managers.
> >
> > I would appreciate any feedback on this idea.
> >
> > Thanks,
> >
> > Andy.
> >
> >
> > [1]
> > https://github.com/github-changelog-generator/github-changelog-generator
> > [2] https://github.com/andygrove/changelog-genie
> >
>

Re: [Rust][MaybeNotJustRust] PR titles and generating change logs

Posted by David Li <li...@apache.org>.
That sort of commit format is usually called "conventional commits" [1] and can also include subcomponent information and self-reported breaking changes. (You might be referencing that format already, it wasn't clear to me.)

And then there are various tools that can generate changelogs from that format using purely the commit log (e.g. arrow-adbc uses [2]), as well as enforcing the commit format (in e.g. pre-commit).

What arrow-adbc has done is enforce the PR title to be in conventional commit format using an action, and then also asked INFRA to set it so that merges squash + use the PR title and description as the commit message. 

[1]: https://www.conventionalcommits.org/en/v1.0.0/
[2]: https://commitizen-tools.github.io/commitizen/

On Tue, Mar 14, 2023, at 11:38, Andy Grove wrote:
> I filed an issue in the datafusion repo as well, since not everyone is on
> the mailing list.
>
> https://github.com/apache/arrow-datafusion/issues/5601
>
> On Tue, Mar 14, 2023 at 9:36 AM Andy Grove <an...@gmail.com> wrote:
>
>> We have been using github-changelog-generator [1] to generate changelogs
>> for the Rust projects for some time now. It has served us well but is no
>> longer workable, at least for DataFusion. This tool seems to pull down the
>> entire project history using the GitHub API and we had to artificially slow
>> it down to avoid hitting API rate limits, and it is now unusable due to the
>> number of issues and PRs in this repo.
>>
>> This weekend, I built a much simpler changelog generator in Python [2],
>> that I am now using for the projects that I am the release manager for
>> (datafusion, datafusion-python, ballista). It has almost the same
>> functionality that we were getting from the previous generator, but takes
>> less than a minute to run, compared to 30+ minutes for the old generator.
>> It only hits the GitHub API for information about commits and pull requests
>> in the release being documented, rather than accessing the entire project
>> history.
>>
>> I followed the same approach of using GitHub labels to categorize PRs
>> (enhancements, bug fixes, docs, etc) but this requires a small amount of
>> manual effort to add those labels and re-generate the changelog.
>>
>> I noticed that some contributors are already prefixing PR titles with
>> "feat:", "feature:", "fix:", "docs:", etc. I plan on updating the changelog
>> generator to recognize these prefixes as well, to help automate my job.
>>
>> I wonder if it is worth formalizing these "semantic titles" more, and
>> maybe having CI enforce them. It would improve the quality of our
>> changelogs and reduce the burden on release managers.
>>
>> I would appreciate any feedback on this idea.
>>
>> Thanks,
>>
>> Andy.
>>
>>
>> [1]
>> https://github.com/github-changelog-generator/github-changelog-generator
>> [2] https://github.com/andygrove/changelog-genie
>>

Re: [Rust][MaybeNotJustRust] PR titles and generating change logs

Posted by Andy Grove <an...@gmail.com>.
I filed an issue in the datafusion repo as well, since not everyone is on
the mailing list.

https://github.com/apache/arrow-datafusion/issues/5601

On Tue, Mar 14, 2023 at 9:36 AM Andy Grove <an...@gmail.com> wrote:

> We have been using github-changelog-generator [1] to generate changelogs
> for the Rust projects for some time now. It has served us well but is no
> longer workable, at least for DataFusion. This tool seems to pull down the
> entire project history using the GitHub API and we had to artificially slow
> it down to avoid hitting API rate limits, and it is now unusable due to the
> number of issues and PRs in this repo.
>
> This weekend, I built a much simpler changelog generator in Python [2],
> that I am now using for the projects that I am the release manager for
> (datafusion, datafusion-python, ballista). It has almost the same
> functionality that we were getting from the previous generator, but takes
> less than a minute to run, compared to 30+ minutes for the old generator.
> It only hits the GitHub API for information about commits and pull requests
> in the release being documented, rather than accessing the entire project
> history.
>
> I followed the same approach of using GitHub labels to categorize PRs
> (enhancements, bug fixes, docs, etc) but this requires a small amount of
> manual effort to add those labels and re-generate the changelog.
>
> I noticed that some contributors are already prefixing PR titles with
> "feat:", "feature:", "fix:", "docs:", etc. I plan on updating the changelog
> generator to recognize these prefixes as well, to help automate my job.
>
> I wonder if it is worth formalizing these "semantic titles" more, and
> maybe having CI enforce them. It would improve the quality of our
> changelogs and reduce the burden on release managers.
>
> I would appreciate any feedback on this idea.
>
> Thanks,
>
> Andy.
>
>
> [1]
> https://github.com/github-changelog-generator/github-changelog-generator
> [2] https://github.com/andygrove/changelog-genie
>