You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Lari Hotari <lh...@apache.org> on 2024/03/01 14:01:55 UTC

[DISCUSS] Broken builds and CI Failures in Maintenance Branches; improving maintenance strategy to address root causes

Dear Pulsar Community,

As we prepare for new releases in our maintenance branches, we have once
again encountered issues with our cherry-picking process. Some of our
maintenance branches are currently broken or were recently broken,
containing compilation errors or failing tests. Many have encountered
these issues, as we have seen new PRs come in to address the
problems. The compilation problems are already being addressed by
Heesung (release manager for 3.0.3) and myself. We aim to resolve these
issues as soon as possible. Please join #dev channel on Apache Pulsar
Slack to collaborate in real time to help with this and get updates.

The cherry-picking process has always been problematic and lacks clear
documentation in Apache Pulsar. This often leads to our maintenance
branches breaking, especially as we approach release dates and begin
cherry-picking fixes. This recurring issue has been the subject of
multiple discussions over the years. The "feature freeze" in the release
process does not mitigate the key problem with the cherry-picking
approach.

Furthermore, the cherry-picking process is mostly based on tribal
knowledge and lacks clear documentation. I have previously expressed my
concerns about this on the mailing list in this thread:
https://lists.apache.org/thread/69mwjso51kzkrv5xgdmw04d9wngbg8br

Many problems with cherry-picking arise because cherry-picks occur in
the wrong order, or dependent changes are not picked. Some dependent
changes shouldn't be picked since when we have made bug fixes in the
master branch, it can already contain changes for new features that
shouldn't be applied to maintenance branches. In those cases
a backport of the fix is needed and the original developer of the 
PR might not be available to do this and there could be a significant
delay for the release if delivering the backport takes time.

When cherry-picking and backporting is delegated to other developers, 
in addition to delays, it can lead to coordination problems and commits
being picked and applied in an order that results in even more merge 
conflicts. Thankfully, this isn't usually too painful, but it does
happen once in a while.

A few days ago, I began working on improving the documentation of the
current process. I have added a section where I share some thoughts and
a tool to prevent future problems. You can find the document here:
https://pulsar.apache.org/contribute/release-process/#cherry-picking-changes-scheduled-for-the-release.
However, this does not fully describe the current process and will only
help to some extent.

The added section should help prevent cherry-picking in the wrong order,
but it still has many gaps. Many developers do not have proper merge
conflict resolution tools configured. Without proper 3-way diff
visualization and merge tools, it's very difficult to resolve many of
the merge conflicts without making mistakes. This also requires a deep
understanding of the module where the conflicts occur.

After we have made the next set of maintenance releases, I plan to
propose an alternative to the cherry-picking process that will address
the main issues that the Apache Pulsar project has been struggling with
every time we do releases.

The alternative would be to designate the LTS branch as the default
branch, make bug fixes primarily in the LTS branch, merge fixes to newer
branches, and cherry-pick to possible older branches. This common
approach in many projects leverages what Git does well: handling
development across multiple branches. This solution ensures that our LTS
branch is always immediately in a releasable state and the branch will
also become the most stable version of Pulsar since bug fixes are
continuously evaluated and integrated into the LTS branch with our CI
where bug fix PRs are targeted to the LTS branch.
Stability was the original goal of PIP-175 where the LTS concept was
introduced to Pulsar.

I hope that our community would be open to making changes to the
maintenance strategy to help resolve the pain that we have to deal with
each time we make releases. Sometimes, this "cherry-picking vs. merging
branches" discussion becomes a "tabs vs. spaces" type of pointless
discussion where personal preferences are emphasized. I hope that we can
avoid that and admit the fact that releasing Apache Pulsar LTS with this
cherry-picking process is a pain and we must fix it to make progress as
a development community.

-Lari

Re: [DISCUSS] Broken builds and CI Failures in Maintenance Branches; improving maintenance strategy to address root causes

Posted by PengHui Li <pe...@apache.org>.
Hi, Lari

Thanks for driving the discussion, and I agree that the cherry-picking is
the pain
especially when we need to maintain old branches for a long time.

Frankly, my first impression is to target the bug fix to branch-3.0, but
the features and
improvements to the master branch will burden the contributors and
committers more.
They might merge the changes to the wrong branches for a time because they
need time
to build muscle memory. Of course, we can use CI to check the labels and
the target branch.
It will not be a blocker.

I agree that the merge branch solution will resolve the ordering and
coordination
issues arising from the cherry-pick solutions. Coordination means how to
decide a PR
should be cherry-picked (Yunze pointed out to me).

I have a few questions about the merge branch solution.

- It looks like we will employ both merge branches and cherry-pick
solutions finally after we have
  4.0. Because at that time, the target branch for the BUG fix is
branch-4.0, and we still have 18
  month overlap.

- For the existing cherry-picking solution, if there is a case that we
can't cherry-pick it due to
  too many conflicts, we will usually create a separate PR for the release
branch directly. How do we
  handle this case with the merge branch solution? If I understand
correctly, we can also push separate
  PRs to the new branches and always apply the new branches when handling
merge conflicts from
  this commit?

- Is it possible to cherry-pick commits from the master to the LTS branch?
The reason for asking this
  question is a PR might be recognized as an improvement, but someone found
it should be contained
  in the LTS version. For example,
https://github.com/apache/pulsar/pull/21739. Maybe there are other
  solutions to handle this case, e.g., push PR directly. Because we might
get much more conflicts at that
  time.

- Do we need to wait for the PRs that are targeted to branch-3.0 to be
merged before cutting
  branch-4.0? Because if there are many comments on the existing PR, we
don't want to ask the author
  to create a new one to continue the review with targeting branch-4.0.
Usually, we will cut branches for
  preparing the release for at least 3 weeks. It sounds like a challenge
because we will only allow
  regression fixes to branch-4.0 during that time. We need to find a
solution for it.

- Does the committer performing the branch merging need to resolve all the
conflicts? I mean, if we have
  20 commits need to merge, and maybe there is only one that is urgent to
merge to the new branch for
  a patch release. With the cherry-pick solution, you can only cherry-pick
that commit and create the
  patch release. I think we must merge all the commits for the merge branch
solution. Maybe I'm wrong.

I would support the merge branch solution and we also need documentation to
clarify the items to note.
If I understand correctly, we can also go back to the current solution if
we find something is not working, right?
Because the cherry-pick is very flexible even if the merges happen between
branches. At least worth trying.

Regards,
Penghui


On Wed, Mar 20, 2024 at 9:38 PM Yunze Xu <xy...@apache.org> wrote:

> > However, in async work, people should have more patience to read and
> write.
>
> I mean, it would be better to have something like "TL; DR". Anyway,
> I'd like to apply this change since the next feature release (3.3.0).
>
> Thanks,
> Yunze
>
> On Tue, Mar 19, 2024 at 12:10 AM Lari Hotari <lh...@apache.org> wrote:
> >
> > Thanks for the comments, Yunze.
> >
> > On 2024/03/18 05:48:39 Yunze Xu wrote:
> > > I'm afraid many people don't have patience to read all the contents.
> >
> > I agree. However, in async work, people should have more patience to
> read and write. Synchronous meetings aren't a good solution either. The
> lack of patience could be caused by lack of interest. There's not a large
> group of people in our community that are interested in improving the
> maintenance strategy and also committed to invest their time and effort in
> these activities. I hope more people sign up to this type of efforts and
> show their interest and commitment in improving Apache Pulsar.
> >
> > > Here is my summary in short (please correct me if I'm wrong):
> > > - For bug fixes, the target branch should be branch-3.0. Once the PR
> > > is merged into branch-3.0, checkout the branch-3.x and run `git merge
> > > branch-3.0` and resolve the conflicts
> >
> > I didn't describe the details of how this is handle. It is different in
> practice.
> >
> > > - For features, the target branch should be branch-3.x
> >
> > New features would continue to go to master (or "main" if we decide to
> rename it). Bugs would be fixed in the branch where the feature containing
> the bug was introduced if it is missing from the LTS branch.
> >
> > > Since we introduced the LTS concept, I agree that we should make
> > > branch-3.0 as the default branch. Cherry-picking is a disaster when
> > > cherry-picks happen in the wrong order.
> >
> > Yes.
> >
> > -Lari
> >
> > On 2024/03/18 05:48:39 Yunze Xu wrote:
> > > I'm afraid many people don't have patience to read all the contents.
> > > Here is my summary in short (please correct me if I'm wrong):
> > > - For bug fixes, the target branch should be branch-3.0. Once the PR
> > > is merged into branch-3.0, checkout the branch-3.x and run `git merge
> > > branch-3.0` and resolve the conflicts
> > > - For features, the target branch should be branch-3.x
> > >
> > > Since we introduced the LTS concept, I agree that we should make
> > > branch-3.0 as the default branch. Cherry-picking is a disaster when
> > > cherry-picks happen in the wrong order.
> > >
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Tue, Mar 5, 2024 at 8:38 PM Lari Hotari <lh...@apache.org> wrote:
> > > >
> > > > To enhance our maintenance processes, I've created a guide for
> > > > configuring "git mergetool" to resolve merge conflicts:
> > > >
> > > > https://pulsar.apache.org/contribute/setup-mergetool/
> > > >
> > > > For Apache Pulsar core developers, managing git merge conflict
> > > > resolution is a necessary task. To streamline this process, it's
> crucial
> > > > to set up tools that aid in visualizing and resolving these
> conflicts.
> > > >
> > > > I encourage you to follow the guide to set up a git mergetool. Your
> > > > feedback is valuable, and you're welcome to contribute improvements
> > > > directly to the website. You can do this by creating a PR by editing
> > > >
> https://github.com/apache/pulsar-site/edit/main/contribute/setup-mergetool.md
> > > > directly in your browser.
> > > >
> > > > -Lari
> > > >
> > > > On 2024/03/01 14:01:55 Lari Hotari wrote:
> > > > > Dear Pulsar Community,
> > > > >
> > > > > As we prepare for new releases in our maintenance branches, we
> have once
> > > > > again encountered issues with our cherry-picking process. Some of
> our
> > > > > maintenance branches are currently broken or were recently broken,
> > > > > containing compilation errors or failing tests. Many have
> encountered
> > > > > these issues, as we have seen new PRs come in to address the
> > > > > problems. The compilation problems are already being addressed by
> > > > > Heesung (release manager for 3.0.3) and myself. We aim to resolve
> these
> > > > > issues as soon as possible. Please join #dev channel on Apache
> Pulsar
> > > > > Slack to collaborate in real time to help with this and get
> updates.
> > > > >
> > > > > The cherry-picking process has always been problematic and lacks
> clear
> > > > > documentation in Apache Pulsar. This often leads to our maintenance
> > > > > branches breaking, especially as we approach release dates and
> begin
> > > > > cherry-picking fixes. This recurring issue has been the subject of
> > > > > multiple discussions over the years. The "feature freeze" in the
> release
> > > > > process does not mitigate the key problem with the cherry-picking
> > > > > approach.
> > > > >
> > > > > Furthermore, the cherry-picking process is mostly based on tribal
> > > > > knowledge and lacks clear documentation. I have previously
> expressed my
> > > > > concerns about this on the mailing list in this thread:
> > > > > https://lists.apache.org/thread/69mwjso51kzkrv5xgdmw04d9wngbg8br
> > > > >
> > > > > Many problems with cherry-picking arise because cherry-picks occur
> in
> > > > > the wrong order, or dependent changes are not picked. Some
> dependent
> > > > > changes shouldn't be picked since when we have made bug fixes in
> the
> > > > > master branch, it can already contain changes for new features that
> > > > > shouldn't be applied to maintenance branches. In those cases
> > > > > a backport of the fix is needed and the original developer of the
> > > > > PR might not be available to do this and there could be a
> significant
> > > > > delay for the release if delivering the backport takes time.
> > > > >
> > > > > When cherry-picking and backporting is delegated to other
> developers,
> > > > > in addition to delays, it can lead to coordination problems and
> commits
> > > > > being picked and applied in an order that results in even more
> merge
> > > > > conflicts. Thankfully, this isn't usually too painful, but it does
> > > > > happen once in a while.
> > > > >
> > > > > A few days ago, I began working on improving the documentation of
> the
> > > > > current process. I have added a section where I share some
> thoughts and
> > > > > a tool to prevent future problems. You can find the document here:
> > > > >
> https://pulsar.apache.org/contribute/release-process/#cherry-picking-changes-scheduled-for-the-release
> .
> > > > > However, this does not fully describe the current process and will
> only
> > > > > help to some extent.
> > > > >
> > > > > The added section should help prevent cherry-picking in the wrong
> order,
> > > > > but it still has many gaps. Many developers do not have proper
> merge
> > > > > conflict resolution tools configured. Without proper 3-way diff
> > > > > visualization and merge tools, it's very difficult to resolve many
> of
> > > > > the merge conflicts without making mistakes. This also requires a
> deep
> > > > > understanding of the module where the conflicts occur.
> > > > >
> > > > > After we have made the next set of maintenance releases, I plan to
> > > > > propose an alternative to the cherry-picking process that will
> address
> > > > > the main issues that the Apache Pulsar project has been struggling
> with
> > > > > every time we do releases.
> > > > >
> > > > > The alternative would be to designate the LTS branch as the default
> > > > > branch, make bug fixes primarily in the LTS branch, merge fixes to
> newer
> > > > > branches, and cherry-pick to possible older branches. This common
> > > > > approach in many projects leverages what Git does well: handling
> > > > > development across multiple branches. This solution ensures that
> our LTS
> > > > > branch is always immediately in a releasable state and the branch
> will
> > > > > also become the most stable version of Pulsar since bug fixes are
> > > > > continuously evaluated and integrated into the LTS branch with our
> CI
> > > > > where bug fix PRs are targeted to the LTS branch.
> > > > > Stability was the original goal of PIP-175 where the LTS concept
> was
> > > > > introduced to Pulsar.
> > > > >
> > > > > I hope that our community would be open to making changes to the
> > > > > maintenance strategy to help resolve the pain that we have to deal
> with
> > > > > each time we make releases. Sometimes, this "cherry-picking vs.
> merging
> > > > > branches" discussion becomes a "tabs vs. spaces" type of pointless
> > > > > discussion where personal preferences are emphasized. I hope that
> we can
> > > > > avoid that and admit the fact that releasing Apache Pulsar LTS
> with this
> > > > > cherry-picking process is a pain and we must fix it to make
> progress as
> > > > > a development community.
> > > > >
> > > > > -Lari
> > > > >
> > >
>

Re: [DISCUSS] Broken builds and CI Failures in Maintenance Branches; improving maintenance strategy to address root causes

Posted by Yunze Xu <xy...@apache.org>.
> However, in async work, people should have more patience to read and write.

I mean, it would be better to have something like "TL; DR". Anyway,
I'd like to apply this change since the next feature release (3.3.0).

Thanks,
Yunze

On Tue, Mar 19, 2024 at 12:10 AM Lari Hotari <lh...@apache.org> wrote:
>
> Thanks for the comments, Yunze.
>
> On 2024/03/18 05:48:39 Yunze Xu wrote:
> > I'm afraid many people don't have patience to read all the contents.
>
> I agree. However, in async work, people should have more patience to read and write. Synchronous meetings aren't a good solution either. The lack of patience could be caused by lack of interest. There's not a large group of people in our community that are interested in improving the maintenance strategy and also committed to invest their time and effort in these activities. I hope more people sign up to this type of efforts and show their interest and commitment in improving Apache Pulsar.
>
> > Here is my summary in short (please correct me if I'm wrong):
> > - For bug fixes, the target branch should be branch-3.0. Once the PR
> > is merged into branch-3.0, checkout the branch-3.x and run `git merge
> > branch-3.0` and resolve the conflicts
>
> I didn't describe the details of how this is handle. It is different in practice.
>
> > - For features, the target branch should be branch-3.x
>
> New features would continue to go to master (or "main" if we decide to rename it). Bugs would be fixed in the branch where the feature containing the bug was introduced if it is missing from the LTS branch.
>
> > Since we introduced the LTS concept, I agree that we should make
> > branch-3.0 as the default branch. Cherry-picking is a disaster when
> > cherry-picks happen in the wrong order.
>
> Yes.
>
> -Lari
>
> On 2024/03/18 05:48:39 Yunze Xu wrote:
> > I'm afraid many people don't have patience to read all the contents.
> > Here is my summary in short (please correct me if I'm wrong):
> > - For bug fixes, the target branch should be branch-3.0. Once the PR
> > is merged into branch-3.0, checkout the branch-3.x and run `git merge
> > branch-3.0` and resolve the conflicts
> > - For features, the target branch should be branch-3.x
> >
> > Since we introduced the LTS concept, I agree that we should make
> > branch-3.0 as the default branch. Cherry-picking is a disaster when
> > cherry-picks happen in the wrong order.
> >
> >
> > Thanks,
> > Yunze
> >
> > On Tue, Mar 5, 2024 at 8:38 PM Lari Hotari <lh...@apache.org> wrote:
> > >
> > > To enhance our maintenance processes, I've created a guide for
> > > configuring "git mergetool" to resolve merge conflicts:
> > >
> > > https://pulsar.apache.org/contribute/setup-mergetool/
> > >
> > > For Apache Pulsar core developers, managing git merge conflict
> > > resolution is a necessary task. To streamline this process, it's crucial
> > > to set up tools that aid in visualizing and resolving these conflicts.
> > >
> > > I encourage you to follow the guide to set up a git mergetool. Your
> > > feedback is valuable, and you're welcome to contribute improvements
> > > directly to the website. You can do this by creating a PR by editing
> > > https://github.com/apache/pulsar-site/edit/main/contribute/setup-mergetool.md
> > > directly in your browser.
> > >
> > > -Lari
> > >
> > > On 2024/03/01 14:01:55 Lari Hotari wrote:
> > > > Dear Pulsar Community,
> > > >
> > > > As we prepare for new releases in our maintenance branches, we have once
> > > > again encountered issues with our cherry-picking process. Some of our
> > > > maintenance branches are currently broken or were recently broken,
> > > > containing compilation errors or failing tests. Many have encountered
> > > > these issues, as we have seen new PRs come in to address the
> > > > problems. The compilation problems are already being addressed by
> > > > Heesung (release manager for 3.0.3) and myself. We aim to resolve these
> > > > issues as soon as possible. Please join #dev channel on Apache Pulsar
> > > > Slack to collaborate in real time to help with this and get updates.
> > > >
> > > > The cherry-picking process has always been problematic and lacks clear
> > > > documentation in Apache Pulsar. This often leads to our maintenance
> > > > branches breaking, especially as we approach release dates and begin
> > > > cherry-picking fixes. This recurring issue has been the subject of
> > > > multiple discussions over the years. The "feature freeze" in the release
> > > > process does not mitigate the key problem with the cherry-picking
> > > > approach.
> > > >
> > > > Furthermore, the cherry-picking process is mostly based on tribal
> > > > knowledge and lacks clear documentation. I have previously expressed my
> > > > concerns about this on the mailing list in this thread:
> > > > https://lists.apache.org/thread/69mwjso51kzkrv5xgdmw04d9wngbg8br
> > > >
> > > > Many problems with cherry-picking arise because cherry-picks occur in
> > > > the wrong order, or dependent changes are not picked. Some dependent
> > > > changes shouldn't be picked since when we have made bug fixes in the
> > > > master branch, it can already contain changes for new features that
> > > > shouldn't be applied to maintenance branches. In those cases
> > > > a backport of the fix is needed and the original developer of the
> > > > PR might not be available to do this and there could be a significant
> > > > delay for the release if delivering the backport takes time.
> > > >
> > > > When cherry-picking and backporting is delegated to other developers,
> > > > in addition to delays, it can lead to coordination problems and commits
> > > > being picked and applied in an order that results in even more merge
> > > > conflicts. Thankfully, this isn't usually too painful, but it does
> > > > happen once in a while.
> > > >
> > > > A few days ago, I began working on improving the documentation of the
> > > > current process. I have added a section where I share some thoughts and
> > > > a tool to prevent future problems. You can find the document here:
> > > > https://pulsar.apache.org/contribute/release-process/#cherry-picking-changes-scheduled-for-the-release.
> > > > However, this does not fully describe the current process and will only
> > > > help to some extent.
> > > >
> > > > The added section should help prevent cherry-picking in the wrong order,
> > > > but it still has many gaps. Many developers do not have proper merge
> > > > conflict resolution tools configured. Without proper 3-way diff
> > > > visualization and merge tools, it's very difficult to resolve many of
> > > > the merge conflicts without making mistakes. This also requires a deep
> > > > understanding of the module where the conflicts occur.
> > > >
> > > > After we have made the next set of maintenance releases, I plan to
> > > > propose an alternative to the cherry-picking process that will address
> > > > the main issues that the Apache Pulsar project has been struggling with
> > > > every time we do releases.
> > > >
> > > > The alternative would be to designate the LTS branch as the default
> > > > branch, make bug fixes primarily in the LTS branch, merge fixes to newer
> > > > branches, and cherry-pick to possible older branches. This common
> > > > approach in many projects leverages what Git does well: handling
> > > > development across multiple branches. This solution ensures that our LTS
> > > > branch is always immediately in a releasable state and the branch will
> > > > also become the most stable version of Pulsar since bug fixes are
> > > > continuously evaluated and integrated into the LTS branch with our CI
> > > > where bug fix PRs are targeted to the LTS branch.
> > > > Stability was the original goal of PIP-175 where the LTS concept was
> > > > introduced to Pulsar.
> > > >
> > > > I hope that our community would be open to making changes to the
> > > > maintenance strategy to help resolve the pain that we have to deal with
> > > > each time we make releases. Sometimes, this "cherry-picking vs. merging
> > > > branches" discussion becomes a "tabs vs. spaces" type of pointless
> > > > discussion where personal preferences are emphasized. I hope that we can
> > > > avoid that and admit the fact that releasing Apache Pulsar LTS with this
> > > > cherry-picking process is a pain and we must fix it to make progress as
> > > > a development community.
> > > >
> > > > -Lari
> > > >
> >

Re: [DISCUSS] Broken builds and CI Failures in Maintenance Branches; improving maintenance strategy to address root causes

Posted by Lari Hotari <lh...@apache.org>.
Thanks for the comments, Yunze.

On 2024/03/18 05:48:39 Yunze Xu wrote:
> I'm afraid many people don't have patience to read all the contents.

I agree. However, in async work, people should have more patience to read and write. Synchronous meetings aren't a good solution either. The lack of patience could be caused by lack of interest. There's not a large group of people in our community that are interested in improving the maintenance strategy and also committed to invest their time and effort in these activities. I hope more people sign up to this type of efforts and show their interest and commitment in improving Apache Pulsar.

> Here is my summary in short (please correct me if I'm wrong):
> - For bug fixes, the target branch should be branch-3.0. Once the PR
> is merged into branch-3.0, checkout the branch-3.x and run `git merge
> branch-3.0` and resolve the conflicts

I didn't describe the details of how this is handle. It is different in practice.

> - For features, the target branch should be branch-3.x

New features would continue to go to master (or "main" if we decide to rename it). Bugs would be fixed in the branch where the feature containing the bug was introduced if it is missing from the LTS branch.

> Since we introduced the LTS concept, I agree that we should make
> branch-3.0 as the default branch. Cherry-picking is a disaster when
> cherry-picks happen in the wrong order.

Yes. 

-Lari

On 2024/03/18 05:48:39 Yunze Xu wrote:
> I'm afraid many people don't have patience to read all the contents.
> Here is my summary in short (please correct me if I'm wrong):
> - For bug fixes, the target branch should be branch-3.0. Once the PR
> is merged into branch-3.0, checkout the branch-3.x and run `git merge
> branch-3.0` and resolve the conflicts
> - For features, the target branch should be branch-3.x
> 
> Since we introduced the LTS concept, I agree that we should make
> branch-3.0 as the default branch. Cherry-picking is a disaster when
> cherry-picks happen in the wrong order.
> 
> 
> Thanks,
> Yunze
> 
> On Tue, Mar 5, 2024 at 8:38 PM Lari Hotari <lh...@apache.org> wrote:
> >
> > To enhance our maintenance processes, I've created a guide for
> > configuring "git mergetool" to resolve merge conflicts:
> >
> > https://pulsar.apache.org/contribute/setup-mergetool/
> >
> > For Apache Pulsar core developers, managing git merge conflict
> > resolution is a necessary task. To streamline this process, it's crucial
> > to set up tools that aid in visualizing and resolving these conflicts.
> >
> > I encourage you to follow the guide to set up a git mergetool. Your
> > feedback is valuable, and you're welcome to contribute improvements
> > directly to the website. You can do this by creating a PR by editing
> > https://github.com/apache/pulsar-site/edit/main/contribute/setup-mergetool.md
> > directly in your browser.
> >
> > -Lari
> >
> > On 2024/03/01 14:01:55 Lari Hotari wrote:
> > > Dear Pulsar Community,
> > >
> > > As we prepare for new releases in our maintenance branches, we have once
> > > again encountered issues with our cherry-picking process. Some of our
> > > maintenance branches are currently broken or were recently broken,
> > > containing compilation errors or failing tests. Many have encountered
> > > these issues, as we have seen new PRs come in to address the
> > > problems. The compilation problems are already being addressed by
> > > Heesung (release manager for 3.0.3) and myself. We aim to resolve these
> > > issues as soon as possible. Please join #dev channel on Apache Pulsar
> > > Slack to collaborate in real time to help with this and get updates.
> > >
> > > The cherry-picking process has always been problematic and lacks clear
> > > documentation in Apache Pulsar. This often leads to our maintenance
> > > branches breaking, especially as we approach release dates and begin
> > > cherry-picking fixes. This recurring issue has been the subject of
> > > multiple discussions over the years. The "feature freeze" in the release
> > > process does not mitigate the key problem with the cherry-picking
> > > approach.
> > >
> > > Furthermore, the cherry-picking process is mostly based on tribal
> > > knowledge and lacks clear documentation. I have previously expressed my
> > > concerns about this on the mailing list in this thread:
> > > https://lists.apache.org/thread/69mwjso51kzkrv5xgdmw04d9wngbg8br
> > >
> > > Many problems with cherry-picking arise because cherry-picks occur in
> > > the wrong order, or dependent changes are not picked. Some dependent
> > > changes shouldn't be picked since when we have made bug fixes in the
> > > master branch, it can already contain changes for new features that
> > > shouldn't be applied to maintenance branches. In those cases
> > > a backport of the fix is needed and the original developer of the
> > > PR might not be available to do this and there could be a significant
> > > delay for the release if delivering the backport takes time.
> > >
> > > When cherry-picking and backporting is delegated to other developers,
> > > in addition to delays, it can lead to coordination problems and commits
> > > being picked and applied in an order that results in even more merge
> > > conflicts. Thankfully, this isn't usually too painful, but it does
> > > happen once in a while.
> > >
> > > A few days ago, I began working on improving the documentation of the
> > > current process. I have added a section where I share some thoughts and
> > > a tool to prevent future problems. You can find the document here:
> > > https://pulsar.apache.org/contribute/release-process/#cherry-picking-changes-scheduled-for-the-release.
> > > However, this does not fully describe the current process and will only
> > > help to some extent.
> > >
> > > The added section should help prevent cherry-picking in the wrong order,
> > > but it still has many gaps. Many developers do not have proper merge
> > > conflict resolution tools configured. Without proper 3-way diff
> > > visualization and merge tools, it's very difficult to resolve many of
> > > the merge conflicts without making mistakes. This also requires a deep
> > > understanding of the module where the conflicts occur.
> > >
> > > After we have made the next set of maintenance releases, I plan to
> > > propose an alternative to the cherry-picking process that will address
> > > the main issues that the Apache Pulsar project has been struggling with
> > > every time we do releases.
> > >
> > > The alternative would be to designate the LTS branch as the default
> > > branch, make bug fixes primarily in the LTS branch, merge fixes to newer
> > > branches, and cherry-pick to possible older branches. This common
> > > approach in many projects leverages what Git does well: handling
> > > development across multiple branches. This solution ensures that our LTS
> > > branch is always immediately in a releasable state and the branch will
> > > also become the most stable version of Pulsar since bug fixes are
> > > continuously evaluated and integrated into the LTS branch with our CI
> > > where bug fix PRs are targeted to the LTS branch.
> > > Stability was the original goal of PIP-175 where the LTS concept was
> > > introduced to Pulsar.
> > >
> > > I hope that our community would be open to making changes to the
> > > maintenance strategy to help resolve the pain that we have to deal with
> > > each time we make releases. Sometimes, this "cherry-picking vs. merging
> > > branches" discussion becomes a "tabs vs. spaces" type of pointless
> > > discussion where personal preferences are emphasized. I hope that we can
> > > avoid that and admit the fact that releasing Apache Pulsar LTS with this
> > > cherry-picking process is a pain and we must fix it to make progress as
> > > a development community.
> > >
> > > -Lari
> > >
> 

Re: [DISCUSS] Broken builds and CI Failures in Maintenance Branches; improving maintenance strategy to address root causes

Posted by Yunze Xu <xy...@apache.org>.
I'm afraid many people don't have patience to read all the contents.
Here is my summary in short (please correct me if I'm wrong):
- For bug fixes, the target branch should be branch-3.0. Once the PR
is merged into branch-3.0, checkout the branch-3.x and run `git merge
branch-3.0` and resolve the conflicts
- For features, the target branch should be branch-3.x

Since we introduced the LTS concept, I agree that we should make
branch-3.0 as the default branch. Cherry-picking is a disaster when
cherry-picks happen in the wrong order.


Thanks,
Yunze

On Tue, Mar 5, 2024 at 8:38 PM Lari Hotari <lh...@apache.org> wrote:
>
> To enhance our maintenance processes, I've created a guide for
> configuring "git mergetool" to resolve merge conflicts:
>
> https://pulsar.apache.org/contribute/setup-mergetool/
>
> For Apache Pulsar core developers, managing git merge conflict
> resolution is a necessary task. To streamline this process, it's crucial
> to set up tools that aid in visualizing and resolving these conflicts.
>
> I encourage you to follow the guide to set up a git mergetool. Your
> feedback is valuable, and you're welcome to contribute improvements
> directly to the website. You can do this by creating a PR by editing
> https://github.com/apache/pulsar-site/edit/main/contribute/setup-mergetool.md
> directly in your browser.
>
> -Lari
>
> On 2024/03/01 14:01:55 Lari Hotari wrote:
> > Dear Pulsar Community,
> >
> > As we prepare for new releases in our maintenance branches, we have once
> > again encountered issues with our cherry-picking process. Some of our
> > maintenance branches are currently broken or were recently broken,
> > containing compilation errors or failing tests. Many have encountered
> > these issues, as we have seen new PRs come in to address the
> > problems. The compilation problems are already being addressed by
> > Heesung (release manager for 3.0.3) and myself. We aim to resolve these
> > issues as soon as possible. Please join #dev channel on Apache Pulsar
> > Slack to collaborate in real time to help with this and get updates.
> >
> > The cherry-picking process has always been problematic and lacks clear
> > documentation in Apache Pulsar. This often leads to our maintenance
> > branches breaking, especially as we approach release dates and begin
> > cherry-picking fixes. This recurring issue has been the subject of
> > multiple discussions over the years. The "feature freeze" in the release
> > process does not mitigate the key problem with the cherry-picking
> > approach.
> >
> > Furthermore, the cherry-picking process is mostly based on tribal
> > knowledge and lacks clear documentation. I have previously expressed my
> > concerns about this on the mailing list in this thread:
> > https://lists.apache.org/thread/69mwjso51kzkrv5xgdmw04d9wngbg8br
> >
> > Many problems with cherry-picking arise because cherry-picks occur in
> > the wrong order, or dependent changes are not picked. Some dependent
> > changes shouldn't be picked since when we have made bug fixes in the
> > master branch, it can already contain changes for new features that
> > shouldn't be applied to maintenance branches. In those cases
> > a backport of the fix is needed and the original developer of the
> > PR might not be available to do this and there could be a significant
> > delay for the release if delivering the backport takes time.
> >
> > When cherry-picking and backporting is delegated to other developers,
> > in addition to delays, it can lead to coordination problems and commits
> > being picked and applied in an order that results in even more merge
> > conflicts. Thankfully, this isn't usually too painful, but it does
> > happen once in a while.
> >
> > A few days ago, I began working on improving the documentation of the
> > current process. I have added a section where I share some thoughts and
> > a tool to prevent future problems. You can find the document here:
> > https://pulsar.apache.org/contribute/release-process/#cherry-picking-changes-scheduled-for-the-release.
> > However, this does not fully describe the current process and will only
> > help to some extent.
> >
> > The added section should help prevent cherry-picking in the wrong order,
> > but it still has many gaps. Many developers do not have proper merge
> > conflict resolution tools configured. Without proper 3-way diff
> > visualization and merge tools, it's very difficult to resolve many of
> > the merge conflicts without making mistakes. This also requires a deep
> > understanding of the module where the conflicts occur.
> >
> > After we have made the next set of maintenance releases, I plan to
> > propose an alternative to the cherry-picking process that will address
> > the main issues that the Apache Pulsar project has been struggling with
> > every time we do releases.
> >
> > The alternative would be to designate the LTS branch as the default
> > branch, make bug fixes primarily in the LTS branch, merge fixes to newer
> > branches, and cherry-pick to possible older branches. This common
> > approach in many projects leverages what Git does well: handling
> > development across multiple branches. This solution ensures that our LTS
> > branch is always immediately in a releasable state and the branch will
> > also become the most stable version of Pulsar since bug fixes are
> > continuously evaluated and integrated into the LTS branch with our CI
> > where bug fix PRs are targeted to the LTS branch.
> > Stability was the original goal of PIP-175 where the LTS concept was
> > introduced to Pulsar.
> >
> > I hope that our community would be open to making changes to the
> > maintenance strategy to help resolve the pain that we have to deal with
> > each time we make releases. Sometimes, this "cherry-picking vs. merging
> > branches" discussion becomes a "tabs vs. spaces" type of pointless
> > discussion where personal preferences are emphasized. I hope that we can
> > avoid that and admit the fact that releasing Apache Pulsar LTS with this
> > cherry-picking process is a pain and we must fix it to make progress as
> > a development community.
> >
> > -Lari
> >

Re: [DISCUSS] Broken builds and CI Failures in Maintenance Branches; improving maintenance strategy to address root causes

Posted by Lari Hotari <lh...@apache.org>.
To enhance our maintenance processes, I've created a guide for
configuring "git mergetool" to resolve merge conflicts:

https://pulsar.apache.org/contribute/setup-mergetool/

For Apache Pulsar core developers, managing git merge conflict
resolution is a necessary task. To streamline this process, it's crucial
to set up tools that aid in visualizing and resolving these conflicts.

I encourage you to follow the guide to set up a git mergetool. Your
feedback is valuable, and you're welcome to contribute improvements
directly to the website. You can do this by creating a PR by editing
https://github.com/apache/pulsar-site/edit/main/contribute/setup-mergetool.md
directly in your browser.

-Lari

On 2024/03/01 14:01:55 Lari Hotari wrote:
> Dear Pulsar Community,
> 
> As we prepare for new releases in our maintenance branches, we have once
> again encountered issues with our cherry-picking process. Some of our
> maintenance branches are currently broken or were recently broken,
> containing compilation errors or failing tests. Many have encountered
> these issues, as we have seen new PRs come in to address the
> problems. The compilation problems are already being addressed by
> Heesung (release manager for 3.0.3) and myself. We aim to resolve these
> issues as soon as possible. Please join #dev channel on Apache Pulsar
> Slack to collaborate in real time to help with this and get updates.
> 
> The cherry-picking process has always been problematic and lacks clear
> documentation in Apache Pulsar. This often leads to our maintenance
> branches breaking, especially as we approach release dates and begin
> cherry-picking fixes. This recurring issue has been the subject of
> multiple discussions over the years. The "feature freeze" in the release
> process does not mitigate the key problem with the cherry-picking
> approach.
> 
> Furthermore, the cherry-picking process is mostly based on tribal
> knowledge and lacks clear documentation. I have previously expressed my
> concerns about this on the mailing list in this thread:
> https://lists.apache.org/thread/69mwjso51kzkrv5xgdmw04d9wngbg8br
> 
> Many problems with cherry-picking arise because cherry-picks occur in
> the wrong order, or dependent changes are not picked. Some dependent
> changes shouldn't be picked since when we have made bug fixes in the
> master branch, it can already contain changes for new features that
> shouldn't be applied to maintenance branches. In those cases
> a backport of the fix is needed and the original developer of the 
> PR might not be available to do this and there could be a significant
> delay for the release if delivering the backport takes time.
> 
> When cherry-picking and backporting is delegated to other developers, 
> in addition to delays, it can lead to coordination problems and commits
> being picked and applied in an order that results in even more merge 
> conflicts. Thankfully, this isn't usually too painful, but it does
> happen once in a while.
> 
> A few days ago, I began working on improving the documentation of the
> current process. I have added a section where I share some thoughts and
> a tool to prevent future problems. You can find the document here:
> https://pulsar.apache.org/contribute/release-process/#cherry-picking-changes-scheduled-for-the-release.
> However, this does not fully describe the current process and will only
> help to some extent.
> 
> The added section should help prevent cherry-picking in the wrong order,
> but it still has many gaps. Many developers do not have proper merge
> conflict resolution tools configured. Without proper 3-way diff
> visualization and merge tools, it's very difficult to resolve many of
> the merge conflicts without making mistakes. This also requires a deep
> understanding of the module where the conflicts occur.
> 
> After we have made the next set of maintenance releases, I plan to
> propose an alternative to the cherry-picking process that will address
> the main issues that the Apache Pulsar project has been struggling with
> every time we do releases.
> 
> The alternative would be to designate the LTS branch as the default
> branch, make bug fixes primarily in the LTS branch, merge fixes to newer
> branches, and cherry-pick to possible older branches. This common
> approach in many projects leverages what Git does well: handling
> development across multiple branches. This solution ensures that our LTS
> branch is always immediately in a releasable state and the branch will
> also become the most stable version of Pulsar since bug fixes are
> continuously evaluated and integrated into the LTS branch with our CI
> where bug fix PRs are targeted to the LTS branch.
> Stability was the original goal of PIP-175 where the LTS concept was
> introduced to Pulsar.
> 
> I hope that our community would be open to making changes to the
> maintenance strategy to help resolve the pain that we have to deal with
> each time we make releases. Sometimes, this "cherry-picking vs. merging
> branches" discussion becomes a "tabs vs. spaces" type of pointless
> discussion where personal preferences are emphasized. I hope that we can
> avoid that and admit the fact that releasing Apache Pulsar LTS with this
> cherry-picking process is a pain and we must fix it to make progress as
> a development community.
> 
> -Lari
>