You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Lari Hotari <lh...@apache.org> on 2023/06/02 08:24:45 UTC

From Tribal Knowledge to Transparency: Enhancing and Documenting the LTS Maintenance & Cherry-Picking Process

Dear Apache Pulsar Committers,

I wish to address a few pressing concerns that emerged while I was
working on cherry-picking PR #20461 [1]. This PR was aimed at upgrading
Jetty from 9.4.48.v20220622 to 9.4.51.v20230217 to address the CVEs
(CVE-2023-26048 and CVE-2023-26049). I discovered that Jetty had already
been upgraded in the maintenance branches through four separate PRs
(#20162, #20226, #20227, and #20228), all titled "[improve][build]
Upgrade dependencies to reduce CVE" [2].

1. The newly adopted process of combining multiple dependency updates
   into a single PR, while omitting changes to the master branch, has
   not been discussed on the mailing list.
2. Our current process, which is based on cherry-picking, should
   maintain traceability across maintenance branches to discern whether
   a change made to the master branch is available in the maintenance
   branches. This breaks with the approach that was used.
3. It is advised that each dependency (or group of related dependencies)
   should be upgraded in its own PR, rather than upgrading multiple
   unrelated dependencies in a single PR.
4. We should aim for all changes to be first made to the master branch
   and then cherry-picked to other branches to prevent the maintenance
   branches from diverging from the master branch.
5. The compilation of release notes becomes challenging when PRs aren't
   atomic.
6. Similarly, detecting regressions can be problematic when PRs aren't
   atomic.

Now, I want to clarify that I'm not entirely supportive of the
cherry-picking process as it currently stands. I personally believe that
a merge-based strategy could be more effective. This strategy would
entail initially making changes to the oldest maintenance branch where a
feature (or a dependency, as in this instance) exists. Subsequently, we
would propagate all changes in a maintenance branch forward towards the
master branch using git merges, effectively managing and resolving any
merge conflicts that might arise along the way. Features wouldn't be
added to maintenance branches. This strategy is employed in several open
source projects, such as Grails [3] and Micronaut [4].

Indeed, there might be exceptions, and for such instances,
cherry-picking would still be a tool within our strategy. The principal
advantage of this proposed approach is that it allocates adequate focus
on the maintenance branch, thereby curbing the instability typically
experienced with our intermediate maintenance versions.

Additionally, the merge-based approach addresses the issue with CI
pipelines. If the PR is made to the maintenance branch, it ensures the
changes integrate well and all tests pass in the maintenance version,
enhancing stability. I understand the counter-argument that this could
confuse our contributors if they have to make the PR against the
maintenance branch. However, this could be mitigated by guidance from
the PR reviewer and adding further information in the contribution guide
and PR template. There are also more radical solutions, such as making
the main maintenance branch the default branch, like the "4.1" branch in
Netty.

The merge strategy also helps ensuring that the LTS maintenance branch
is always in a releasable state. Currently, it takes a significant
amount of time to "stabilize" the branch before releasing. This is a
counterproductive pattern and a waste of time that we must address and
improve.

There seem to be inherent obstacles in our existing process, evidenced
by the recent adoption of bundled PR types that circumvent our
cherry-picking process. Ordinarily, we insist on creating atomic PRs to
the master branch prior to initiating cherry-picking and backporting. I
would be keen to hear about the issues others have encountered with the
cherry-picking process. Identifying these pain points is the first step
towards refining and optimizing our process.

With Pulsar's recent transition to a new Long-Term Support (LTS) release
strategy, the stability of the LTS release has emerged as a vital
concern. Our current cherry-picking process, which has sometimes led to
insufficient integration testing within the maintenance branches, has
been proven ineffective at maintaining the requisite stability. If we do
not revisit our maintenance processes, the new LTS release strategy could
encounter the same instability issues. Thus, in order to fully reap the
benefits of the LTS release strategy, we must prioritize the improvement
of our maintenance processes.

In the existing procedure, the task of cherry-picking individual commits
can become quite tedious, especially when it necessitates crafting a new
PR for each cherry-picked commit. One possible solution to this
inefficiency may be to enhance the coordination of cherry-picking. Under
such a system, the committer could instigate a test run encompassing a
sensible quantity of cherry-picked commits, thereby circumventing the
need for separate PRs for each cherry-picked item. Furthermore, the
implementation of a nightly build for all maintenance branches, set to
execute if any changes have transpired since the last run, could be
advantageous. By employing this approach, we can consistently maintain
our branches in an optimal and release-ready state.

A significant deficiency in our current cherry-picking process is its
status as tribal knowledge, without a clearly documented description in
place. While we do possess a release process guide [5], it does not
adequately elaborate on the procedure. Similarly, our release policy [6]
does not delve into the specifics of this process either. This lack of
comprehensive documentation leaves a significant knowledge gap in our
workflow.

Our existing documentation [6] on the cherry-picking process states,
"Generally, one committer shall volunteer as the release manager (RM) for
a specific release. For feature releases and LTS releases, the last 3
weeks of the release cycle will be marked as a code-freeze period. The
RM will branch off from master, and the RM is also responsible for
selecting the changes that will be cherry-picked in the release branch."

Unfortunately, this description falls short of the actual process. As it
stands, we frequently cherry-pick commits as soon as the master branch
PR has been merged. The description mentions Release Manager (RM)
responsible for selecting the changes which isn't even the usual case.
This practice is opaque and problematic. This situation prompts several
crucial questions — what decision-making criteria does the RM use, and
how do they manage quality assurance? It's currently the case that we
need a substantial amount of time to prepare a maintenance branch for
release, which clearly underscores that our current process requires
significant enhancement.

Moreover, while the recent implementation of the Long-Term Support (LTS)
strategy is a significant step, it doesn't appear to have brought about
a radical shift in our approach. Aside from committing to maintain a
specific version for a longer duration, our operational methodology
hasn't undergone substantial enhancements. To truly honor our commitment
to long-term support, it's incumbent upon us to reform our processes,
making them more efficient, reliable, and effective. Merely increasing
the responsibilities of the Release Manager isn't the solution.

An enterprise IT professional might suggest the introduction of a Change
Advisory Board (CAB). However, such a measure doesn't necessarily
address the core issue at hand. As the book "Accelerate: The Science of
Lean Software and DevOps" [7] describes, approval by an external body
(such as a manager or CAB), contrary to common belief, often do not
result in higher levels of stability and can actually slow down the
development process. We need to seek strategies that not only preserve
stability but also promote agility and efficiency in our workflows.

Thank you for your attention, and I look forward to hearing your
thoughts on these matters. Meanwhile, I kindly request that we stick to
our established cherry-picking process until a collective decision is
made on a potential alternative. This implies discontinuing the current
practice of bundling multiple changes in PRs to maintenance branches.

Moreover, I earnestly hope for widespread involvement in refining this
process. Specifically, I look forward to significant participation from
the Apache Pulsar committers and PMC members in this pivotal discussion.
Your collective insights and contributions will be important in
effecting the much-needed improvements. 

In addition to discussions, there will also be a need for substantial
effort. We must document the process thoroughly and continuously improve
it as we gather more feedback during its progress.

I'm looking forward to an active discussion and concrete contributions
as PRs to our release policy & process documentation! Sharing the tribal
knowledge is also welcome if you don't feel like contributing directly
to documentation. ;)

-Lari

[1] - https://github.com/apache/pulsar/pull/20461
[2] - https://github.com/apache/pulsar/pulls?q=is%3Apr+%22Upgrade+dependencies+to+reduce+CVE%22+is%3Aclosed
[3] - https://github.com/grails/grails-core
[4] - https://github.com/micronaut-projects/micronaut-core
[5] - https://pulsar.apache.org/contribute/release-process/
[6] - https://pulsar.apache.org/contribute/release-policy/
[7] - https://itrevolution.com/book/accelerate/

Appendix: 
Quote from "Accelerate: The Science of Lean Software and DevOps" [7]
related to change approval by an external body (such as a manager or
Change Advisory Board):

"We investigated further the case of approval by an external body to see
if this practice correlated with stability. We found that external
approvals were negatively correlated with lead time, deployment
frequency, and restore time, and had no correlation with change fail
rate. In short, approval by an external body (such as a manager or CAB)
simply doesn’t work to increase the stability of production systems,
measured by the time to restore service and change fail rate. However,
it certainly slows things down. It is, in fact, worse than having no
change approval process at all.

Our recommendation based on these results is to use a lightweight change
approval process based on peer review, such as pair programming or
intrateam code review, combined with a deployment pipeline to detect and
reject bad changes. This process can be used for all kinds of changes,
including code, infrastructure, and database changes."

Re: From Tribal Knowledge to Transparency: Enhancing and Documenting the LTS Maintenance & Cherry-Picking Process

Posted by Yunze Xu <xy...@apache.org>.
Hi Lari,

Generally I agree with your points. The cherry-picking criteria is
very ambiguous and not clear. I also found some unexpected
cherry-picks recently. For example, #16014 [1] actually breaks the
compatibility of determining whether to use a TLS service URL for a
built-in Pulsar client. Though it might be acceptable for a major
release IMO, it's unacceptable to be cherry-picked into release
branches without applying the comment here [2]. You can see KoP is
also affected by this commit and we need to handle the breaking change
by these two PRs ([3] and [4]).

The root cause is that there is no clear cherry-picking criteria so
that there is no strict review for adding the `release/x.y.z` labels.
The `release/x.y.z` labels are added very casually.

Based on the example above, could you explain more about the merge
strategy and how could it avoid such cases? I found this thread talks
much about the PROs and CONs but not much about the strategy itself.
From my understanding, assuming the LTS version is 3.0 and the next
feature release version is 3.1, we need to:
1. Open PRs to branch-3.0 for bug fixes and merge it to branch-3.1?
2. Open PRs to branch-3.1 for new features?

So when releasing Pulsar 3.1.0, we don't need to determine which PRs
to cherry-pick?

[1] https://github.com/apache/pulsar/pull/16014
[2] https://github.com/apache/pulsar/pull/16014#discussion_r896163915
[3] https://github.com/streamnative/kop/pull/1848
[4] https://github.com/streamnative/kop/pull/1870

Thanks,
Yunze

On Fri, Jun 9, 2023 at 3:14 AM Michael Marshall <mm...@apache.org> wrote:
>
> Thanks for this discussion, Lari. I agree with your points and will
> just make a few additions.
>
> @Lari - I wonder if there is a way we can show the strategy more
> definitively. Have you looked into what it would take to merge
> branch-3.0 to master? If that is too hard, we could pilot merging
> branch-3.1 into master once it is cut.
>
> > don't think we should propagate commits from the least recent stable
> > version; especially it's hard to tell
> > which one is the least recently maintained stable version (2.10? 2.9? or
> > even 2.8?).
>
> If we switch to a merge strategy, I think we would need a dual mode
> for some time where we only merge branches that are easy to merge and
> we cherry pick backwards as needed.
>
> Another detail that would make the merge strategy valuable is that it
> would make it much easier for us to cut frequent releases. It
> decreases the work on the release manager by removing the need to make
> a bunch of cherry picks. The work is pushed to committers and will
> have the benefit of some batching of commits.
>
> As it stands, releases take a very long time, which takes up valuable
> committer time and makes it harder to stick to release schedules.
>
> One interesting detail is that Pulsar's commit velocity has decreased.
> That might make it easier to test out a new strategy now.
>
> Thanks,
> Michael
>
>
> On Thu, Jun 8, 2023 at 5:34 AM tison <wa...@gmail.com> wrote:
> >
> > Hi Lari,
> >
> > Thanks for starting this discussion! I did some cherry-picking recently and
> > encountered a few cases
> > that can be noted in your email. The most surprising process to me is that
> > we pick commit directly
> > without any regression tests (some is associated with PRs, but it's not the
> > normal case).
> >
> > Let me reply to this thread inline.
> >
> > > merge-based strategy
> >
> > I don't think we should propagate commits from the least recent stable
> > version; especially it's hard to tell
> > which one is the least recently maintained stable version (2.10? 2.9? or
> > even 2.8?).
> >
> > Flink maintains cherry-picking well by running regression tests before any
> > commits, and so do the cherry-picked
> > ones. I don't understand why cherry-picking commits without CI coverage is
> > possible since different branches can be
> > regarded as quite different codebases. Besides, Flink uses JIRA to manage
> > tickets, and it's clear to associate a ticket
> > with several fixed versions. Also, any PRs should be associated with a
> > ticket (Pulsar has >50% standalone PRs).
> >
> > However, given that 3.0 is an LTS version is possible to follow Netty's
> > strategy. And OpenJDK even creates different
> > repos for their LTS versions.
> >
> > > a test run encompassing a
> > > sensible quantity of cherry-picked commits
> >
> > I'd call this a manual roll up, where Rust use bors to do auto PR batch[1].
> > Given that we enable "Rebase and merge"
> > button on the main repo, it's possible that keep the atomic commit history
> > while rolling up PRs.
> >
> > [1] https://forge.rust-lang.org/release/rollups.html
> >
> > > what decision-making criteria does the RM use, and
> > > how do they manage quality assurance?
> >
> > I agree this is vague now. At least we can force a PR for cherry-picking
> > and ensure CI workflows run. Thus, we're
> > sure at least regression tests passed, and open the window for comments
> > (comment on commits lacks a lot of
> > visibility while it's possible). Furthermore, we don't require approval or
> > status checks for PR against maintained
> > branches.
> >
> > Although, I agree that it's majorly a transparency issue over a process
> > issue: Flink doesn't set these rules also;
> > a committer can also push a commit or merge a PR without any requirement.
> > But Flink committers agree that
> > PRs are required for any changes, and they generally actively ask for
> > reviews.
> >
> > Back to the criteria, at least only (security) fixes should be picked. This
> > is not easy because, from my observation:
> >
> > 1. Fixes can depend on upgrade dependency, while a valid dep version with
> > fixes can introduce unexpected changes.
> > 2. Some features are unstable but we do not mark them as beta so we'd like
> > to "fix" them in previous versions even it
> > means a significant rewrite.
> >
> > Java uses "--enable-preview" and "--add-modules jdk.incubator.xxx" to fence
> > unstable features and no fixes will be
> > backported.
> >
> > Best,
> > tison.

Re: From Tribal Knowledge to Transparency: Enhancing and Documenting the LTS Maintenance & Cherry-Picking Process

Posted by Michael Marshall <mm...@apache.org>.
Thanks for this discussion, Lari. I agree with your points and will
just make a few additions.

@Lari - I wonder if there is a way we can show the strategy more
definitively. Have you looked into what it would take to merge
branch-3.0 to master? If that is too hard, we could pilot merging
branch-3.1 into master once it is cut.

> don't think we should propagate commits from the least recent stable
> version; especially it's hard to tell
> which one is the least recently maintained stable version (2.10? 2.9? or
> even 2.8?).

If we switch to a merge strategy, I think we would need a dual mode
for some time where we only merge branches that are easy to merge and
we cherry pick backwards as needed.

Another detail that would make the merge strategy valuable is that it
would make it much easier for us to cut frequent releases. It
decreases the work on the release manager by removing the need to make
a bunch of cherry picks. The work is pushed to committers and will
have the benefit of some batching of commits.

As it stands, releases take a very long time, which takes up valuable
committer time and makes it harder to stick to release schedules.

One interesting detail is that Pulsar's commit velocity has decreased.
That might make it easier to test out a new strategy now.

Thanks,
Michael


On Thu, Jun 8, 2023 at 5:34 AM tison <wa...@gmail.com> wrote:
>
> Hi Lari,
>
> Thanks for starting this discussion! I did some cherry-picking recently and
> encountered a few cases
> that can be noted in your email. The most surprising process to me is that
> we pick commit directly
> without any regression tests (some is associated with PRs, but it's not the
> normal case).
>
> Let me reply to this thread inline.
>
> > merge-based strategy
>
> I don't think we should propagate commits from the least recent stable
> version; especially it's hard to tell
> which one is the least recently maintained stable version (2.10? 2.9? or
> even 2.8?).
>
> Flink maintains cherry-picking well by running regression tests before any
> commits, and so do the cherry-picked
> ones. I don't understand why cherry-picking commits without CI coverage is
> possible since different branches can be
> regarded as quite different codebases. Besides, Flink uses JIRA to manage
> tickets, and it's clear to associate a ticket
> with several fixed versions. Also, any PRs should be associated with a
> ticket (Pulsar has >50% standalone PRs).
>
> However, given that 3.0 is an LTS version is possible to follow Netty's
> strategy. And OpenJDK even creates different
> repos for their LTS versions.
>
> > a test run encompassing a
> > sensible quantity of cherry-picked commits
>
> I'd call this a manual roll up, where Rust use bors to do auto PR batch[1].
> Given that we enable "Rebase and merge"
> button on the main repo, it's possible that keep the atomic commit history
> while rolling up PRs.
>
> [1] https://forge.rust-lang.org/release/rollups.html
>
> > what decision-making criteria does the RM use, and
> > how do they manage quality assurance?
>
> I agree this is vague now. At least we can force a PR for cherry-picking
> and ensure CI workflows run. Thus, we're
> sure at least regression tests passed, and open the window for comments
> (comment on commits lacks a lot of
> visibility while it's possible). Furthermore, we don't require approval or
> status checks for PR against maintained
> branches.
>
> Although, I agree that it's majorly a transparency issue over a process
> issue: Flink doesn't set these rules also;
> a committer can also push a commit or merge a PR without any requirement.
> But Flink committers agree that
> PRs are required for any changes, and they generally actively ask for
> reviews.
>
> Back to the criteria, at least only (security) fixes should be picked. This
> is not easy because, from my observation:
>
> 1. Fixes can depend on upgrade dependency, while a valid dep version with
> fixes can introduce unexpected changes.
> 2. Some features are unstable but we do not mark them as beta so we'd like
> to "fix" them in previous versions even it
> means a significant rewrite.
>
> Java uses "--enable-preview" and "--add-modules jdk.incubator.xxx" to fence
> unstable features and no fixes will be
> backported.
>
> Best,
> tison.

Re: From Tribal Knowledge to Transparency: Enhancing and Documenting the LTS Maintenance & Cherry-Picking Process

Posted by tison <wa...@gmail.com>.
Hi Lari,

Thanks for starting this discussion! I did some cherry-picking recently and
encountered a few cases
that can be noted in your email. The most surprising process to me is that
we pick commit directly
without any regression tests (some is associated with PRs, but it's not the
normal case).

Let me reply to this thread inline.

> merge-based strategy

I don't think we should propagate commits from the least recent stable
version; especially it's hard to tell
which one is the least recently maintained stable version (2.10? 2.9? or
even 2.8?).

Flink maintains cherry-picking well by running regression tests before any
commits, and so do the cherry-picked
ones. I don't understand why cherry-picking commits without CI coverage is
possible since different branches can be
regarded as quite different codebases. Besides, Flink uses JIRA to manage
tickets, and it's clear to associate a ticket
with several fixed versions. Also, any PRs should be associated with a
ticket (Pulsar has >50% standalone PRs).

However, given that 3.0 is an LTS version is possible to follow Netty's
strategy. And OpenJDK even creates different
repos for their LTS versions.

> a test run encompassing a
> sensible quantity of cherry-picked commits

I'd call this a manual roll up, where Rust use bors to do auto PR batch[1].
Given that we enable "Rebase and merge"
button on the main repo, it's possible that keep the atomic commit history
while rolling up PRs.

[1] https://forge.rust-lang.org/release/rollups.html

> what decision-making criteria does the RM use, and
> how do they manage quality assurance?

I agree this is vague now. At least we can force a PR for cherry-picking
and ensure CI workflows run. Thus, we're
sure at least regression tests passed, and open the window for comments
(comment on commits lacks a lot of
visibility while it's possible). Furthermore, we don't require approval or
status checks for PR against maintained
branches.

Although, I agree that it's majorly a transparency issue over a process
issue: Flink doesn't set these rules also;
a committer can also push a commit or merge a PR without any requirement.
But Flink committers agree that
PRs are required for any changes, and they generally actively ask for
reviews.

Back to the criteria, at least only (security) fixes should be picked. This
is not easy because, from my observation:

1. Fixes can depend on upgrade dependency, while a valid dep version with
fixes can introduce unexpected changes.
2. Some features are unstable but we do not mark them as beta so we'd like
to "fix" them in previous versions even it
means a significant rewrite.

Java uses "--enable-preview" and "--add-modules jdk.incubator.xxx" to fence
unstable features and no fixes will be
backported.

Best,
tison.

Re: From Tribal Knowledge to Transparency: Enhancing and Documenting the LTS Maintenance & Cherry-Picking Process

Posted by Lari Hotari <lh...@apache.org>.
Shortly after sending my previous email, I recalled that our
cherry-picking process has been a topic of prior discussion. I'd like to
draw your attention to a relevant mailing list thread initiated by
Michael Marshall back in December 2021 titled, 
"[DISCUSS] Add definition to our cherry-picking process":
https://lists.apache.org/thread/zqdqz4jd641vszkj3mzdn6zc3yt56rsk

This thread contains numerous insightful comments.

Furthermore, I discovered Yong Zhang's "[DISCUSS] Introduce a
cherry-pick command for cherry-picking PRs automatically" from October
2020: https://lists.apache.org/thread/49dj5j9yjjqpzjssoc3mqh0yxss5z041

And "[DISCUSS] Propose More Formal Policy for Security Patches and EOL
of Versions" by Michael Marshall in May 2021:
https://lists.apache.org/thread/2bgznyt9fxnosymprot4wyfd01mv0m58

In addition to the "[DISCUSS] Updating our Pulsar Release Plan"
discussion initiated by Michael Marshall in March 2022:
https://lists.apache.org/thread/wkm1slrg341kbq7m83nms97df28kl4of

As well as Yunze Xu's "[DISCUSS] Improvements on the release process" in
September 2022:
https://lists.apache.org/thread/zmwf5mozjqq164fk2r4m2jzv6s1kxyxk

And finally, "[Discuss] Add a phase to process pending PRs before code
freeze" also by Yunze Xu in April 2023:
https://lists.apache.org/thread/p8vgfsg2wfzsnmnwmcnj9xtz54nq45xb

For the sake of completeness, I would like to mention that our current
release policy and process are documented in the links I shared in my
previous email (https://pulsar.apache.org/contribute/release-policy/ and
https://pulsar.apache.org/contribute/release-process/).

A review of our archives and PIPs reveals that Michael, Penghui, Matteo,
Yunze Xu, and Tison (Zili Chen) have been instrumental in enhancing our
process and its documentation, effectively bringing it to its current
documented state. I want to express my gratitude to all of you for your
diligent contributions. I also extend my appreciation to everyone else
who has contributed to this area in the past.

I would like to extend a special thank you to Michael Marshall, who has
been unwavering in his pursuit of refining the release process for
Apache Pulsar. His transparent approach to fostering enhancements within
our community is highly commendable. His impact is well-documented in
our mailing list archives and the Pulsar community meeting minutes [1].

In Apache projects, mailing lists serve as a critical avenue for
community engagement. The phrase "If it's not on the mailing list, it
didn't happen" underscores the importance placed on the transparency,
inclusivity, and openness within the community. These discussions are
integral for tracking project decisions and changes over time. The
accessibility of the mailing lists ensures that every community member
has an opportunity to contribute to and learn from the collective
knowledge.

Let's maintain this momentum and continue enhancing our process! Your
contributions are welcome!

-Lari

[1] - Link to meeting minutes available at
https://github.com/apache/pulsar/wiki/Community-Meeting

On 2023/06/02 08:24:45 Lari Hotari wrote:
> Dear Apache Pulsar Committers,
> 
> I wish to address a few pressing concerns that emerged while I was
> working on cherry-picking PR #20461 [1]. This PR was aimed at upgrading
> Jetty from 9.4.48.v20220622 to 9.4.51.v20230217 to address the CVEs
> (CVE-2023-26048 and CVE-2023-26049). I discovered that Jetty had already
> been upgraded in the maintenance branches through four separate PRs
> (#20162, #20226, #20227, and #20228), all titled "[improve][build]
> Upgrade dependencies to reduce CVE" [2].
> 
> 1. The newly adopted process of combining multiple dependency updates
>    into a single PR, while omitting changes to the master branch, has
>    not been discussed on the mailing list.
> 2. Our current process, which is based on cherry-picking, should
>    maintain traceability across maintenance branches to discern whether
>    a change made to the master branch is available in the maintenance
>    branches. This breaks with the approach that was used.
> 3. It is advised that each dependency (or group of related dependencies)
>    should be upgraded in its own PR, rather than upgrading multiple
>    unrelated dependencies in a single PR.
> 4. We should aim for all changes to be first made to the master branch
>    and then cherry-picked to other branches to prevent the maintenance
>    branches from diverging from the master branch.
> 5. The compilation of release notes becomes challenging when PRs aren't
>    atomic.
> 6. Similarly, detecting regressions can be problematic when PRs aren't
>    atomic.
> 
> Now, I want to clarify that I'm not entirely supportive of the
> cherry-picking process as it currently stands. I personally believe that
> a merge-based strategy could be more effective. This strategy would
> entail initially making changes to the oldest maintenance branch where a
> feature (or a dependency, as in this instance) exists. Subsequently, we
> would propagate all changes in a maintenance branch forward towards the
> master branch using git merges, effectively managing and resolving any
> merge conflicts that might arise along the way. Features wouldn't be
> added to maintenance branches. This strategy is employed in several open
> source projects, such as Grails [3] and Micronaut [4].
> 
> Indeed, there might be exceptions, and for such instances,
> cherry-picking would still be a tool within our strategy. The principal
> advantage of this proposed approach is that it allocates adequate focus
> on the maintenance branch, thereby curbing the instability typically
> experienced with our intermediate maintenance versions.
> 
> Additionally, the merge-based approach addresses the issue with CI
> pipelines. If the PR is made to the maintenance branch, it ensures the
> changes integrate well and all tests pass in the maintenance version,
> enhancing stability. I understand the counter-argument that this could
> confuse our contributors if they have to make the PR against the
> maintenance branch. However, this could be mitigated by guidance from
> the PR reviewer and adding further information in the contribution guide
> and PR template. There are also more radical solutions, such as making
> the main maintenance branch the default branch, like the "4.1" branch in
> Netty.
> 
> The merge strategy also helps ensuring that the LTS maintenance branch
> is always in a releasable state. Currently, it takes a significant
> amount of time to "stabilize" the branch before releasing. This is a
> counterproductive pattern and a waste of time that we must address and
> improve.
> 
> There seem to be inherent obstacles in our existing process, evidenced
> by the recent adoption of bundled PR types that circumvent our
> cherry-picking process. Ordinarily, we insist on creating atomic PRs to
> the master branch prior to initiating cherry-picking and backporting. I
> would be keen to hear about the issues others have encountered with the
> cherry-picking process. Identifying these pain points is the first step
> towards refining and optimizing our process.
> 
> With Pulsar's recent transition to a new Long-Term Support (LTS) release
> strategy, the stability of the LTS release has emerged as a vital
> concern. Our current cherry-picking process, which has sometimes led to
> insufficient integration testing within the maintenance branches, has
> been proven ineffective at maintaining the requisite stability. If we do
> not revisit our maintenance processes, the new LTS release strategy could
> encounter the same instability issues. Thus, in order to fully reap the
> benefits of the LTS release strategy, we must prioritize the improvement
> of our maintenance processes.
> 
> In the existing procedure, the task of cherry-picking individual commits
> can become quite tedious, especially when it necessitates crafting a new
> PR for each cherry-picked commit. One possible solution to this
> inefficiency may be to enhance the coordination of cherry-picking. Under
> such a system, the committer could instigate a test run encompassing a
> sensible quantity of cherry-picked commits, thereby circumventing the
> need for separate PRs for each cherry-picked item. Furthermore, the
> implementation of a nightly build for all maintenance branches, set to
> execute if any changes have transpired since the last run, could be
> advantageous. By employing this approach, we can consistently maintain
> our branches in an optimal and release-ready state.
> 
> A significant deficiency in our current cherry-picking process is its
> status as tribal knowledge, without a clearly documented description in
> place. While we do possess a release process guide [5], it does not
> adequately elaborate on the procedure. Similarly, our release policy [6]
> does not delve into the specifics of this process either. This lack of
> comprehensive documentation leaves a significant knowledge gap in our
> workflow.
> 
> Our existing documentation [6] on the cherry-picking process states,
> "Generally, one committer shall volunteer as the release manager (RM) for
> a specific release. For feature releases and LTS releases, the last 3
> weeks of the release cycle will be marked as a code-freeze period. The
> RM will branch off from master, and the RM is also responsible for
> selecting the changes that will be cherry-picked in the release branch."
> 
> Unfortunately, this description falls short of the actual process. As it
> stands, we frequently cherry-pick commits as soon as the master branch
> PR has been merged. The description mentions Release Manager (RM)
> responsible for selecting the changes which isn't even the usual case.
> This practice is opaque and problematic. This situation prompts several
> crucial questions — what decision-making criteria does the RM use, and
> how do they manage quality assurance? It's currently the case that we
> need a substantial amount of time to prepare a maintenance branch for
> release, which clearly underscores that our current process requires
> significant enhancement.
> 
> Moreover, while the recent implementation of the Long-Term Support (LTS)
> strategy is a significant step, it doesn't appear to have brought about
> a radical shift in our approach. Aside from committing to maintain a
> specific version for a longer duration, our operational methodology
> hasn't undergone substantial enhancements. To truly honor our commitment
> to long-term support, it's incumbent upon us to reform our processes,
> making them more efficient, reliable, and effective. Merely increasing
> the responsibilities of the Release Manager isn't the solution.
> 
> An enterprise IT professional might suggest the introduction of a Change
> Advisory Board (CAB). However, such a measure doesn't necessarily
> address the core issue at hand. As the book "Accelerate: The Science of
> Lean Software and DevOps" [7] describes, approval by an external body
> (such as a manager or CAB), contrary to common belief, often do not
> result in higher levels of stability and can actually slow down the
> development process. We need to seek strategies that not only preserve
> stability but also promote agility and efficiency in our workflows.
> 
> Thank you for your attention, and I look forward to hearing your
> thoughts on these matters. Meanwhile, I kindly request that we stick to
> our established cherry-picking process until a collective decision is
> made on a potential alternative. This implies discontinuing the current
> practice of bundling multiple changes in PRs to maintenance branches.
> 
> Moreover, I earnestly hope for widespread involvement in refining this
> process. Specifically, I look forward to significant participation from
> the Apache Pulsar committers and PMC members in this pivotal discussion.
> Your collective insights and contributions will be important in
> effecting the much-needed improvements. 
> 
> In addition to discussions, there will also be a need for substantial
> effort. We must document the process thoroughly and continuously improve
> it as we gather more feedback during its progress.
> 
> I'm looking forward to an active discussion and concrete contributions
> as PRs to our release policy & process documentation! Sharing the tribal
> knowledge is also welcome if you don't feel like contributing directly
> to documentation. ;)
> 
> -Lari
> 
> [1] - https://github.com/apache/pulsar/pull/20461
> [2] - https://github.com/apache/pulsar/pulls?q=is%3Apr+%22Upgrade+dependencies+to+reduce+CVE%22+is%3Aclosed
> [3] - https://github.com/grails/grails-core
> [4] - https://github.com/micronaut-projects/micronaut-core
> [5] - https://pulsar.apache.org/contribute/release-process/
> [6] - https://pulsar.apache.org/contribute/release-policy/
> [7] - https://itrevolution.com/book/accelerate/
> 
> Appendix: 
> Quote from "Accelerate: The Science of Lean Software and DevOps" [7]
> related to change approval by an external body (such as a manager or
> Change Advisory Board):
> 
> "We investigated further the case of approval by an external body to see
> if this practice correlated with stability. We found that external
> approvals were negatively correlated with lead time, deployment
> frequency, and restore time, and had no correlation with change fail
> rate. In short, approval by an external body (such as a manager or CAB)
> simply doesn’t work to increase the stability of production systems,
> measured by the time to restore service and change fail rate. However,
> it certainly slows things down. It is, in fact, worse than having no
> change approval process at all.
> 
> Our recommendation based on these results is to use a lightweight change
> approval process based on peer review, such as pair programming or
> intrateam code review, combined with a deployment pipeline to detect and
> reject bad changes. This process can be used for all kinds of changes,
> including code, infrastructure, and database changes."
>