You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Neal Richardson <ne...@gmail.com> on 2021/01/19 23:16:20 UTC

[Proposal] Modify release process to vote only on source release

Hi all,
Over the past year, there's been a lot of discussion around the challenges
we face as a project in doing releases. Because they are costly to do, we
don't do them often; because we don't do them often, they become even
costlier.

There are only a small number of people (PMC members with GPG keys
registered with ASF) who could possibly be release manager, and because of
the amount of time required (I saw Krisztián say on the 3.0 release thread
something like "I'll start a new rc, it'll be done in 12 hours), even fewer
people could be expected to take on the burden. Indeed, this is Krisztián's
10th release in a row as release manager, and over the course of the
project, 2/3 of all release candidates have been made by just 2 people.

I'd like to propose a change to our release procedure: instead of having
the release candidate vote include Python wheels, Linux system packages, or
any other binary packages, we should only vote on the source release.
Binary artifacts would be produced as post-release tasks, using the
official source release.

This would greatly reduce the time and effort it takes to produce a release
candidate--tar, sign, and upload, that's it--and it would remove a bunch of
points of failure from the release-candidate making process (timeouts, CI
flakiness, etc.). It would also mean fewer release-blocking issues--we
still have to fix the packaging builds, but doing so can happen in parallel
with the verification process. If we found problems in the packaging
scripts, fixes could either be applied as patch steps to the binary
artifact build scripts, or if fixes can be produced quickly, we collect
them and cut another (cheap) release candidate. Right now, our only option
is the latter, which makes for a slow, stressful release process where
there are so many places where a simple issue can block the whole release
or set us back an additional week (a full day to produce a release
candidate plus another three to vote).

If we went this direction, we could still choose to vote separately on
binary packages like wheels, though I'm not sure that's worth the effort.
Many of the packages that people use (conda, homebrew, CRAN, etc.) are
already "unofficial" releases because they're packaged by someone else, and
I don't think the distinction is meaningful to our users.

To be clear, this doesn't reduce the general maintenance burden of the
project. We still have to monitor nightly builds, fix packaging scripts
that break, and deal with CI service interruptions. This change would just
reduce the burden on the release manager and allow us to spread more
broadly the costs of packaging and releasing. It also solves questions such
as "Why should the Rust release be blocked just because we're having a
problem building Python wheels on macOS?"

There are also other things we could do that would, on a technical level,
improve our ability to make releases more efficiently. Andy Grove's change
in the use of maven in the release process will help, as would a number of
CI/CD improvements. I view these as complementary to this proposal, which
is a governance question with technical/logistical implications.

Thoughts?

Neal

Re: [Proposal] Modify release process to vote only on source release

Posted by Neal Richardson <ne...@gmail.com>.

Comments inline.

On Thu, Jan 21, 2021 at 6:19 AM Krisztián Szűcs <sz...@gmail.com>
wrote:

> On Thu, Jan 21, 2021 at 8:11 AM Sutou Kouhei <ko...@clear-code.com> wrote:
> >
> > Hi,
> >
> > I'm not sure how much this change will improve our release
> > process but I'm OK with this try.
> >
> > Here are technical blockers for this try:
> >
> >   * Java packaging: WIP: https://github.com/apache/arrow/pull/9155
> >     * It takes 10m+.
> >     * It may be failed because a release manager needs to prepare
> >       local environment to do this.
> Preferably we should dockerize this step as well.
> >
> >   * GLib source archive preparation:
> >
> https://github.com/apache/arrow/blob/master/dev/release/source/build.sh
> >     * It takes 1m+.
> >     * It may not be failed because most tasks are done in Docker.
> >       But it means that a release manager needs to prepare Docker.
> I had multiple failures during this step before containerization,
> since then it never fails.
> >
> > There are still some small tasks(*) to build source archive
> > but they aren't blockers.
> >
> > (*)
> https://github.com/apache/arrow/blob/master/dev/release/02-source.sh#L84-L97
> >
> > We can avoid GLib source archive preparation by dropping
> > support for GNU Autotools. They are used on CentOS 7 and
> > Ubuntu 16.04. We can use alternative build system (Meson) on
> > CentOS 7. We'll drop support for Ubuntu 16.04 soon. (Ubuntu
> > 16.04's EOL is 2021-04.)
> >
> >
> > > I'll start a new rc, it'll be done in 12 hours
> >
> > As my past release manager experience, here are time
> > consumption tasks:
> >
> >   1. Fixing nightly builds
> >      * Generally, we always have failure builds.
> >      * I needed 2~3 days for this.
> >      * I'm still working on this even when I'm not a release manager.
>
>
> >   2. Build source including Java packages preparation
> >      * I always failed this with some problems and retried
> >        multiple times.
> I experienced the same and each iteration takes 10+ minutes.
> >      * For example: https://issues.apache.org/jira/browse/ARROW-5764
> >        [Java] Failed to build document with OpenJDK 11
> >        (This is not fixed yet.)
> >      * I can't go to the next step while this task isn't completed.
> >
> >   3. Building binary packages
> >      * I just need to wait 1~2 hours.
> It usually took around 3 hours. Appveyor was the slowest component
> here because it offered no parallelization, so we had to wait 4 wheel
> builds each taking around 50 minutes.
> This is the first release where we build the windows wheels on github
> actions, now the overall time to build the binaries is just a bit
> above one hour.
> >        * We'll be able to speed up this by using cache such as
> >          ccache for C++ in Crossbow tasks: 1~2 hours -> 10~20 minutes
> We always create new branches, so it would require tricky workaround
> to utilize github actions cache plugin, see the cache scope at
> https://github.com/actions/cache#cache-scopes
> >      * Generally, this isn't failed because nightly builds are fixed.
> >
> >   4. Downloading built binary packages and uploading binary packages
> >      * It takes 1~2 hours because we have many files.
> Downloading takes 10-15 minutes on a 500Mbit/s network with a single
> thread.
> I tried to parallelize it before, but quickly hit the github api abuse
> limit, see
> https://docs.github.com/en/rest/overview/resources-in-the-rest-api#abuse-rate-limits
>
> Uploading binaries is the slowest part of the process, it takes around
> 2 hours despite that we upload the binaries concurrently. Bintray also
> tends to reject requests so I need to restart the uploading script
> multiple times before completion. Occasionally I switch to cellular
> network to make the uploading process slower but more stable.
>

These hours add up. And a big reason you have been the release manager for
the last 10 releases is that it's too much of a commitment for most other
PMCs to sign onto. But strip these away and the release manager can be more
of a "manager" and just make sure that the work gets done by someone.

>
> >   5. Verifying RC before starting vote
> >      * I can start source verification while building binary packages.
> >      * It takes 1~2 hours.
> >      * Generally, I find some problems and fix them with the first RC.
> >        * Most problems are caused by outdated verification script.
> >        * It takes +0.5-1 hour per problem.
> >        * I'm still working on this even when I'm not a release manager.
> This caused the current release to take more time.
> >
> > This proposal will defer costs of 3., 4. and part of 5.
> > 1. still exists because we can't keep green nightly builds
> > for now.
>

I see it as a mix of deferring and decentralizing those costs. For example,
currently the Homebrew formula is already outside of the release process,
it's a post-release task. On several occasions, we've had to add a patch
step to the formula after the release to fix some issue that only happens
in the Homebrew environment. That's fine, it has no bearing on the release,
and those in the community who care to ensure that we have an up-to-date
formula take care of it. Likewise with conda. This proposal would move
wheels et al. to have the same status as those packages.

I get the impression that the two of you (Kou and Krisztián) aren't
envisioning much benefit to this proposal because you think "well I already
spend all this time fixing the Linux packages/Python wheels, and I'll still
have to do that even if the vote is only on the source". That may be true
because you are stakeholders who care about ensuring those packages exist.
What the proposal entails is that you don't also have to own all of the
other packages at release time.


> >
> >
> > > It also solves questions such as "Why should the Rust
> > > release be blocked just because we're having a problem
> > > building Python wheels on macOS?"
> >
> > It solves the question only when the problem is only related
> > to packaging. If we have a non-packaging problem such as
> > integration test failure, our release will be blocked.
>

Indeed. One of my points with this proposal is that we have too many
potential release blockers, and as the project grows, we only add more
blockers (packaging systems, platforms, languages). It would help if we
could scale back the number of truly blocking issues.


> >
> >
> >
> > I sill think that implementing continuous (nightly) release
> > verification is needed and maintained. If we keep green
> > release verification, we'll always be able to cut a RC
> > without problems.
>

I agree completely! And this is not mutually exclusive with my proposal.

I would like this approach more. If we could simulate the release
> process and its verification in a nightly bases then we shouldn't have
> any major surprises.
>

In my ideal world, cutting a release is a formality, and release
verification never fails because we essentially release and verify every
night. We should work towards that. However, we should also acknowledge
that after every release, we collect a set of JIRAs that would improve our
release verification, we do some of them but not all of them, and then we
add more after the next release. And although there certainly have been
improvements made, I don't think the overall cost of doing a release has
gone down since I joined the project two years ago.

So while I would like us to pursue technical improvements, I'm less
optimistic that we can code our way out of these problems. Sometimes the
most efficient solution to a problem is to redefine it.


> >
> >
> > Thanks,
> > --
> > kou
> >
> > In <CA...@mail.gmail.com>
> >   "[Proposal] Modify release process to vote only on source release" on
> Tue, 19 Jan 2021 15:16:20 -0800,
> >   Neal Richardson <ne...@gmail.com> wrote:
> >
> > > Hi all,
> > > Over the past year, there's been a lot of discussion around the
> challenges
> > > we face as a project in doing releases. Because they are costly to do,
> we
> > > don't do them often; because we don't do them often, they become even
> > > costlier.
> > >
> > > There are only a small number of people (PMC members with GPG keys
> > > registered with ASF) who could possibly be release manager, and
> because of
> > > the amount of time required (I saw Krisztián say on the 3.0 release
> thread
> > > something like "I'll start a new rc, it'll be done in 12 hours), even
> fewer
> > > people could be expected to take on the burden. Indeed, this is
> Krisztián's
> > > 10th release in a row as release manager, and over the course of the
> > > project, 2/3 of all release candidates have been made by just 2 people.
> > >
> > > I'd like to propose a change to our release procedure: instead of
> having
> > > the release candidate vote include Python wheels, Linux system
> packages, or
> > > any other binary packages, we should only vote on the source release.
> > > Binary artifacts would be produced as post-release tasks, using the
> > > official source release.
> > >
> > > This would greatly reduce the time and effort it takes to produce a
> release
> > > candidate--tar, sign, and upload, that's it--and it would remove a
> bunch of
> > > points of failure from the release-candidate making process (timeouts,
> CI
> > > flakiness, etc.). It would also mean fewer release-blocking issues--we
> > > still have to fix the packaging builds, but doing so can happen in
> parallel
> > > with the verification process. If we found problems in the packaging
> > > scripts, fixes could either be applied as patch steps to the binary
> > > artifact build scripts, or if fixes can be produced quickly, we collect
> > > them and cut another (cheap) release candidate. Right now, our only
> option
> > > is the latter, which makes for a slow, stressful release process where
> > > there are so many places where a simple issue can block the whole
> release
> > > or set us back an additional week (a full day to produce a release
> > > candidate plus another three to vote).
> > >
> > > If we went this direction, we could still choose to vote separately on
> > > binary packages like wheels, though I'm not sure that's worth the
> effort.
> > > Many of the packages that people use (conda, homebrew, CRAN, etc.) are
> > > already "unofficial" releases because they're packaged by someone
> else, and
> > > I don't think the distinction is meaningful to our users.
> > >
> > > To be clear, this doesn't reduce the general maintenance burden of the
> > > project. We still have to monitor nightly builds, fix packaging scripts
> > > that break, and deal with CI service interruptions. This change would
> just
> > > reduce the burden on the release manager and allow us to spread more
> > > broadly the costs of packaging and releasing. It also solves questions
> such
> > > as "Why should the Rust release be blocked just because we're having a
> > > problem building Python wheels on macOS?"
> > >
> > > There are also other things we could do that would, on a technical
> level,
> > > improve our ability to make releases more efficiently. Andy Grove's
> change
> > > in the use of maven in the release process will help, as would a
> number of
> > > CI/CD improvements. I view these as complementary to this proposal,
> which
> > > is a governance question with technical/logistical implications.
> > >
> > > Thoughts?
> > >
> > > Neal
>

Re: [Proposal] Modify release process to vote only on source release

Posted by Krisztián Szűcs <sz...@gmail.com>.

On Thu, Jan 21, 2021 at 8:11 AM Sutou Kouhei <ko...@clear-code.com> wrote:
>
> Hi,
>
> I'm not sure how much this change will improve our release
> process but I'm OK with this try.
>
> Here are technical blockers for this try:
>
>   * Java packaging: WIP: https://github.com/apache/arrow/pull/9155
>     * It takes 10m+.
>     * It may be failed because a release manager needs to prepare
>       local environment to do this.
Preferably we should dockerize this step as well.
>
>   * GLib source archive preparation:
>     https://github.com/apache/arrow/blob/master/dev/release/source/build.sh
>     * It takes 1m+.
>     * It may not be failed because most tasks are done in Docker.
>       But it means that a release manager needs to prepare Docker.
I had multiple failures during this step before containerization,
since then it never fails.
>
> There are still some small tasks(*) to build source archive
> but they aren't blockers.
>
> (*) https://github.com/apache/arrow/blob/master/dev/release/02-source.sh#L84-L97
>
> We can avoid GLib source archive preparation by dropping
> support for GNU Autotools. They are used on CentOS 7 and
> Ubuntu 16.04. We can use alternative build system (Meson) on
> CentOS 7. We'll drop support for Ubuntu 16.04 soon. (Ubuntu
> 16.04's EOL is 2021-04.)
>
>
> > I'll start a new rc, it'll be done in 12 hours
>
> As my past release manager experience, here are time
> consumption tasks:
>
>   1. Fixing nightly builds
>      * Generally, we always have failure builds.
>      * I needed 2~3 days for this.
>      * I'm still working on this even when I'm not a release manager.
>
>   2. Build source including Java packages preparation
>      * I always failed this with some problems and retried
>        multiple times.
I experienced the same and each iteration takes 10+ minutes.
>      * For example: https://issues.apache.org/jira/browse/ARROW-5764
>        [Java] Failed to build document with OpenJDK 11
>        (This is not fixed yet.)
>      * I can't go to the next step while this task isn't completed.
>
>   3. Building binary packages
>      * I just need to wait 1~2 hours.
It usually took around 3 hours. Appveyor was the slowest component
here because it offered no parallelization, so we had to wait 4 wheel
builds each taking around 50 minutes.
This is the first release where we build the windows wheels on github
actions, now the overall time to build the binaries is just a bit
above one hour.
>        * We'll be able to speed up this by using cache such as
>          ccache for C++ in Crossbow tasks: 1~2 hours -> 10~20 minutes
We always create new branches, so it would require tricky workaround
to utilize github actions cache plugin, see the cache scope at
https://github.com/actions/cache#cache-scopes
>      * Generally, this isn't failed because nightly builds are fixed.
>
>   4. Downloading built binary packages and uploading binary packages
>      * It takes 1~2 hours because we have many files.
Downloading takes 10-15 minutes on a 500Mbit/s network with a single thread.
I tried to parallelize it before, but quickly hit the github api abuse
limit, see https://docs.github.com/en/rest/overview/resources-in-the-rest-api#abuse-rate-limits

Uploading binaries is the slowest part of the process, it takes around
2 hours despite that we upload the binaries concurrently. Bintray also
tends to reject requests so I need to restart the uploading script
multiple times before completion. Occasionally I switch to cellular
network to make the uploading process slower but more stable.
>
>   5. Verifying RC before starting vote
>      * I can start source verification while building binary packages.
>      * It takes 1~2 hours.
>      * Generally, I find some problems and fix them with the first RC.
>        * Most problems are caused by outdated verification script.
>        * It takes +0.5-1 hour per problem.
>        * I'm still working on this even when I'm not a release manager.
This caused the current release to take more time.
>
> This proposal will defer costs of 3., 4. and part of 5.
> 1. still exists because we can't keep green nightly builds
> for now.
>
>
> > It also solves questions such as "Why should the Rust
> > release be blocked just because we're having a problem
> > building Python wheels on macOS?"
>
> It solves the question only when the problem is only related
> to packaging. If we have a non-packaging problem such as
> integration test failure, our release will be blocked.
>
>
>
> I sill think that implementing continuous (nightly) release
> verification is needed and maintained. If we keep green
> release verification, we'll always be able to cut a RC
> without problems.
I would like this approach more. If we could simulate the release
process and its verification in a nightly bases then we shouldn't have
any major surprises.
>
>
> Thanks,
> --
> kou
>
> In <CA...@mail.gmail.com>
>   "[Proposal] Modify release process to vote only on source release" on Tue, 19 Jan 2021 15:16:20 -0800,
>   Neal Richardson <ne...@gmail.com> wrote:
>
> > Hi all,
> > Over the past year, there's been a lot of discussion around the challenges
> > we face as a project in doing releases. Because they are costly to do, we
> > don't do them often; because we don't do them often, they become even
> > costlier.
> >
> > There are only a small number of people (PMC members with GPG keys
> > registered with ASF) who could possibly be release manager, and because of
> > the amount of time required (I saw Krisztián say on the 3.0 release thread
> > something like "I'll start a new rc, it'll be done in 12 hours), even fewer
> > people could be expected to take on the burden. Indeed, this is Krisztián's
> > 10th release in a row as release manager, and over the course of the
> > project, 2/3 of all release candidates have been made by just 2 people.
> >
> > I'd like to propose a change to our release procedure: instead of having
> > the release candidate vote include Python wheels, Linux system packages, or
> > any other binary packages, we should only vote on the source release.
> > Binary artifacts would be produced as post-release tasks, using the
> > official source release.
> >
> > This would greatly reduce the time and effort it takes to produce a release
> > candidate--tar, sign, and upload, that's it--and it would remove a bunch of
> > points of failure from the release-candidate making process (timeouts, CI
> > flakiness, etc.). It would also mean fewer release-blocking issues--we
> > still have to fix the packaging builds, but doing so can happen in parallel
> > with the verification process. If we found problems in the packaging
> > scripts, fixes could either be applied as patch steps to the binary
> > artifact build scripts, or if fixes can be produced quickly, we collect
> > them and cut another (cheap) release candidate. Right now, our only option
> > is the latter, which makes for a slow, stressful release process where
> > there are so many places where a simple issue can block the whole release
> > or set us back an additional week (a full day to produce a release
> > candidate plus another three to vote).
> >
> > If we went this direction, we could still choose to vote separately on
> > binary packages like wheels, though I'm not sure that's worth the effort.
> > Many of the packages that people use (conda, homebrew, CRAN, etc.) are
> > already "unofficial" releases because they're packaged by someone else, and
> > I don't think the distinction is meaningful to our users.
> >
> > To be clear, this doesn't reduce the general maintenance burden of the
> > project. We still have to monitor nightly builds, fix packaging scripts
> > that break, and deal with CI service interruptions. This change would just
> > reduce the burden on the release manager and allow us to spread more
> > broadly the costs of packaging and releasing. It also solves questions such
> > as "Why should the Rust release be blocked just because we're having a
> > problem building Python wheels on macOS?"
> >
> > There are also other things we could do that would, on a technical level,
> > improve our ability to make releases more efficiently. Andy Grove's change
> > in the use of maven in the release process will help, as would a number of
> > CI/CD improvements. I view these as complementary to this proposal, which
> > is a governance question with technical/logistical implications.
> >
> > Thoughts?
> >
> > Neal

Re: [Proposal] Modify release process to vote only on source release

Posted by Sutou Kouhei <ko...@clear-code.com>.

Hi,

I'm not sure how much this change will improve our release
process but I'm OK with this try.

Here are technical blockers for this try:

  * Java packaging: WIP: https://github.com/apache/arrow/pull/9155
    * It takes 10m+.
    * It may be failed because a release manager needs to prepare
      local environment to do this.

  * GLib source archive preparation:
    https://github.com/apache/arrow/blob/master/dev/release/source/build.sh
    * It takes 1m+.
    * It may not be failed because most tasks are done in Docker.
      But it means that a release manager needs to prepare Docker.

There are still some small tasks(*) to build source archive
but they aren't blockers.

(*) https://github.com/apache/arrow/blob/master/dev/release/02-source.sh#L84-L97

We can avoid GLib source archive preparation by dropping
support for GNU Autotools. They are used on CentOS 7 and
Ubuntu 16.04. We can use alternative build system (Meson) on
CentOS 7. We'll drop support for Ubuntu 16.04 soon. (Ubuntu
16.04's EOL is 2021-04.)


> I'll start a new rc, it'll be done in 12 hours

As my past release manager experience, here are time
consumption tasks:

  1. Fixing nightly builds
     * Generally, we always have failure builds.
     * I needed 2~3 days for this.
     * I'm still working on this even when I'm not a release manager.

  2. Build source including Java packages preparation
     * I always failed this with some problems and retried
       multiple times.
     * For example: https://issues.apache.org/jira/browse/ARROW-5764
       [Java] Failed to build document with OpenJDK 11
       (This is not fixed yet.)
     * I can't go to the next step while this task isn't completed.

  3. Building binary packages
     * I just need to wait 1~2 hours.
       * We'll be able to speed up this by using cache such as
         ccache for C++ in Crossbow tasks: 1~2 hours -> 10~20 minutes
     * Generally, this isn't failed because nightly builds are fixed.

  4. Downloading built binary packages and uploading binary packages
     * It takes 1~2 hours because we have many files.

  5. Verifying RC before starting vote
     * I can start source verification while building binary packages.
     * It takes 1~2 hours.
     * Generally, I find some problems and fix them with the first RC.
       * Most problems are caused by outdated verification script.
       * It takes +0.5-1 hour per problem.
       * I'm still working on this even when I'm not a release manager.

This proposal will defer costs of 3., 4. and part of 5.
1. still exists because we can't keep green nightly builds
for now.


> It also solves questions such as "Why should the Rust
> release be blocked just because we're having a problem
> building Python wheels on macOS?"

It solves the question only when the problem is only related
to packaging. If we have a non-packaging problem such as
integration test failure, our release will be blocked.



I sill think that implementing continuous (nightly) release
verification is needed and maintained. If we keep green
release verification, we'll always be able to cut a RC
without problems.


Thanks,
--
kou

In <CA...@mail.gmail.com>
  "[Proposal] Modify release process to vote only on source release" on Tue, 19 Jan 2021 15:16:20 -0800,
  Neal Richardson <ne...@gmail.com> wrote:

> Hi all,
> Over the past year, there's been a lot of discussion around the challenges
> we face as a project in doing releases. Because they are costly to do, we
> don't do them often; because we don't do them often, they become even
> costlier.
> 
> There are only a small number of people (PMC members with GPG keys
> registered with ASF) who could possibly be release manager, and because of
> the amount of time required (I saw Krisztián say on the 3.0 release thread
> something like "I'll start a new rc, it'll be done in 12 hours), even fewer
> people could be expected to take on the burden. Indeed, this is Krisztián's
> 10th release in a row as release manager, and over the course of the
> project, 2/3 of all release candidates have been made by just 2 people.
> 
> I'd like to propose a change to our release procedure: instead of having
> the release candidate vote include Python wheels, Linux system packages, or
> any other binary packages, we should only vote on the source release.
> Binary artifacts would be produced as post-release tasks, using the
> official source release.
> 
> This would greatly reduce the time and effort it takes to produce a release
> candidate--tar, sign, and upload, that's it--and it would remove a bunch of
> points of failure from the release-candidate making process (timeouts, CI
> flakiness, etc.). It would also mean fewer release-blocking issues--we
> still have to fix the packaging builds, but doing so can happen in parallel
> with the verification process. If we found problems in the packaging
> scripts, fixes could either be applied as patch steps to the binary
> artifact build scripts, or if fixes can be produced quickly, we collect
> them and cut another (cheap) release candidate. Right now, our only option
> is the latter, which makes for a slow, stressful release process where
> there are so many places where a simple issue can block the whole release
> or set us back an additional week (a full day to produce a release
> candidate plus another three to vote).
> 
> If we went this direction, we could still choose to vote separately on
> binary packages like wheels, though I'm not sure that's worth the effort.
> Many of the packages that people use (conda, homebrew, CRAN, etc.) are
> already "unofficial" releases because they're packaged by someone else, and
> I don't think the distinction is meaningful to our users.
> 
> To be clear, this doesn't reduce the general maintenance burden of the
> project. We still have to monitor nightly builds, fix packaging scripts
> that break, and deal with CI service interruptions. This change would just
> reduce the burden on the release manager and allow us to spread more
> broadly the costs of packaging and releasing. It also solves questions such
> as "Why should the Rust release be blocked just because we're having a
> problem building Python wheels on macOS?"
> 
> There are also other things we could do that would, on a technical level,
> improve our ability to make releases more efficiently. Andy Grove's change
> in the use of maven in the release process will help, as would a number of
> CI/CD improvements. I view these as complementary to this proposal, which
> is a governance question with technical/logistical implications.
> 
> Thoughts?
> 
> Neal

Re: [Proposal] Modify release process to vote only on source release

Posted by Neal Richardson <ne...@gmail.com>.

Agreed, there are multiple issues to resolve in order for our release
process to be manageable and scalable for the project. This procedural
change is not a silver bullet, and if we agree to it, it doesn't mean that
our releases are "fixed". But it's the only change where the solution is a
discussion and vote, not a JIRA and pull request.

Neal

On Tue, Jan 19, 2021 at 6:18 PM Wes McKinney <we...@gmail.com> wrote:

> I'm OK with moving to source only releases, but we need to take a step
> back and consider how our CI/CD is failing to notify us in a suitably
> timely and automated way about the packages being broken. For example,
> the fact that we had 2 failed RCs as the result of packaging issues
> points to a broken process.
>
> So there are a couple issues at play:
>
> * The act of _producing_ the package artifacts should not stop a
> release vote from proceeding like it does now (the "12 hours" you
> refer to that's caused by slow iteration time with Crossbow — this is
> also a problem, can we not fix this?)
> * We need a better feedback loop to determine whether master is in a
> releasable state, including all relevant packages
>
> If we commit ourselves to solving one problem but not both, I fear
> that we will find ourselves suffering from other kinds of problems in
> future release cycles
>
> On Tue, Jan 19, 2021 at 5:16 PM Neal Richardson
> <ne...@gmail.com> wrote:
> >
> > Hi all,
> > Over the past year, there's been a lot of discussion around the
> challenges
> > we face as a project in doing releases. Because they are costly to do, we
> > don't do them often; because we don't do them often, they become even
> > costlier.
> >
> > There are only a small number of people (PMC members with GPG keys
> > registered with ASF) who could possibly be release manager, and because
> of
> > the amount of time required (I saw Krisztián say on the 3.0 release
> thread
> > something like "I'll start a new rc, it'll be done in 12 hours), even
> fewer
> > people could be expected to take on the burden. Indeed, this is
> Krisztián's
> > 10th release in a row as release manager, and over the course of the
> > project, 2/3 of all release candidates have been made by just 2 people.
> >
> > I'd like to propose a change to our release procedure: instead of having
> > the release candidate vote include Python wheels, Linux system packages,
> or
> > any other binary packages, we should only vote on the source release.
> > Binary artifacts would be produced as post-release tasks, using the
> > official source release.
> >
> > This would greatly reduce the time and effort it takes to produce a
> release
> > candidate--tar, sign, and upload, that's it--and it would remove a bunch
> of
> > points of failure from the release-candidate making process (timeouts, CI
> > flakiness, etc.). It would also mean fewer release-blocking issues--we
> > still have to fix the packaging builds, but doing so can happen in
> parallel
> > with the verification process. If we found problems in the packaging
> > scripts, fixes could either be applied as patch steps to the binary
> > artifact build scripts, or if fixes can be produced quickly, we collect
> > them and cut another (cheap) release candidate. Right now, our only
> option
> > is the latter, which makes for a slow, stressful release process where
> > there are so many places where a simple issue can block the whole release
> > or set us back an additional week (a full day to produce a release
> > candidate plus another three to vote).
> >
> > If we went this direction, we could still choose to vote separately on
> > binary packages like wheels, though I'm not sure that's worth the effort.
> > Many of the packages that people use (conda, homebrew, CRAN, etc.) are
> > already "unofficial" releases because they're packaged by someone else,
> and
> > I don't think the distinction is meaningful to our users.
> >
> > To be clear, this doesn't reduce the general maintenance burden of the
> > project. We still have to monitor nightly builds, fix packaging scripts
> > that break, and deal with CI service interruptions. This change would
> just
> > reduce the burden on the release manager and allow us to spread more
> > broadly the costs of packaging and releasing. It also solves questions
> such
> > as "Why should the Rust release be blocked just because we're having a
> > problem building Python wheels on macOS?"
> >
> > There are also other things we could do that would, on a technical level,
> > improve our ability to make releases more efficiently. Andy Grove's
> change
> > in the use of maven in the release process will help, as would a number
> of
> > CI/CD improvements. I view these as complementary to this proposal, which
> > is a governance question with technical/logistical implications.
> >
> > Thoughts?
> >
> > Neal
>

Re: [Proposal] Modify release process to vote only on source release

Posted by Wes McKinney <we...@gmail.com>.

I'm OK with moving to source only releases, but we need to take a step
back and consider how our CI/CD is failing to notify us in a suitably
timely and automated way about the packages being broken. For example,
the fact that we had 2 failed RCs as the result of packaging issues
points to a broken process.

So there are a couple issues at play:

* The act of _producing_ the package artifacts should not stop a
release vote from proceeding like it does now (the "12 hours" you
refer to that's caused by slow iteration time with Crossbow — this is
also a problem, can we not fix this?)
* We need a better feedback loop to determine whether master is in a
releasable state, including all relevant packages

If we commit ourselves to solving one problem but not both, I fear
that we will find ourselves suffering from other kinds of problems in
future release cycles

On Tue, Jan 19, 2021 at 5:16 PM Neal Richardson
<ne...@gmail.com> wrote:
>
> Hi all,
> Over the past year, there's been a lot of discussion around the challenges
> we face as a project in doing releases. Because they are costly to do, we
> don't do them often; because we don't do them often, they become even
> costlier.
>
> There are only a small number of people (PMC members with GPG keys
> registered with ASF) who could possibly be release manager, and because of
> the amount of time required (I saw Krisztián say on the 3.0 release thread
> something like "I'll start a new rc, it'll be done in 12 hours), even fewer
> people could be expected to take on the burden. Indeed, this is Krisztián's
> 10th release in a row as release manager, and over the course of the
> project, 2/3 of all release candidates have been made by just 2 people.
>
> I'd like to propose a change to our release procedure: instead of having
> the release candidate vote include Python wheels, Linux system packages, or
> any other binary packages, we should only vote on the source release.
> Binary artifacts would be produced as post-release tasks, using the
> official source release.
>
> This would greatly reduce the time and effort it takes to produce a release
> candidate--tar, sign, and upload, that's it--and it would remove a bunch of
> points of failure from the release-candidate making process (timeouts, CI
> flakiness, etc.). It would also mean fewer release-blocking issues--we
> still have to fix the packaging builds, but doing so can happen in parallel
> with the verification process. If we found problems in the packaging
> scripts, fixes could either be applied as patch steps to the binary
> artifact build scripts, or if fixes can be produced quickly, we collect
> them and cut another (cheap) release candidate. Right now, our only option
> is the latter, which makes for a slow, stressful release process where
> there are so many places where a simple issue can block the whole release
> or set us back an additional week (a full day to produce a release
> candidate plus another three to vote).
>
> If we went this direction, we could still choose to vote separately on
> binary packages like wheels, though I'm not sure that's worth the effort.
> Many of the packages that people use (conda, homebrew, CRAN, etc.) are
> already "unofficial" releases because they're packaged by someone else, and
> I don't think the distinction is meaningful to our users.
>
> To be clear, this doesn't reduce the general maintenance burden of the
> project. We still have to monitor nightly builds, fix packaging scripts
> that break, and deal with CI service interruptions. This change would just
> reduce the burden on the release manager and allow us to spread more
> broadly the costs of packaging and releasing. It also solves questions such
> as "Why should the Rust release be blocked just because we're having a
> problem building Python wheels on macOS?"
>
> There are also other things we could do that would, on a technical level,
> improve our ability to make releases more efficiently. Andy Grove's change
> in the use of maven in the release process will help, as would a number of
> CI/CD improvements. I view these as complementary to this proposal, which
> is a governance question with technical/logistical implications.
>
> Thoughts?
>
> Neal