You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Ali Ahmed <ah...@gmail.com> on 2021/02/01 20:40:06 UTC

Re: [Discuss] draft PIP for "Changes to GitHub Actions based Pulsar CI"

I recommend we move the connectors away for the pulsar repo to reduce the
load on the main ci pipeline. The new repo seems ready.
https://github.com/apache/pulsar-connectors.

-Ali

On Fri, Jan 29, 2021 at 9:22 AM Sijie Guo <gu...@gmail.com> wrote:

> Currently, Github Actions are shared across one large `apache`
> organization. It is the main problem for GA-based CI besides flaky tests.
>
> If we use Azure Pipeline, we can have a dedicated project for the pulsar.
> So we will have more resources to run.
> It will solve the problem that this proposal tries to solve. The approach
> has been used by Flink. We have started some experiments. We will share
> some of results here next week.
>
> Thanks,
> Sijie
>
> On Fri, Jan 29, 2021 at 8:34 AM Lari Hotari <La...@hotari.net> wrote:
>
> > Hi Sijie,
> >
> > Let's keep this work going since resolving the problems with Pulsar CI
> are
> > urgent.
> >
> > I took a quick glance on the Azure Pipelines solution in Flink. By
> Googling
> > I found
> https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines
> > .
> > In the repository I found
> > https://github.com/apache/flink/blob/master/azure-pipelines.yml which
> > references
> >
> >
> https://github.com/apache/flink/blob/master/tools/azure-pipelines/jobs-template.yml
> >
> > It uses the build matrix feature to parallelize the execution:
> >
> >
> https://github.com/apache/flink/blob/dd0ee24e55dab4ae76201103c76495bc4fa0f73b/tools/azure-pipelines/jobs-template.yml#L88-L107
> >
> > What would be the key benefit for Pulsar CI of using Azure Pipelines over
> > GitHub Actions?
> >
> > -Lari
> >
> > On Fri, Jan 29, 2021 at 6:03 PM Sijie Guo <gu...@gmail.com> wrote:
> >
> > > Lari,
> > >
> > > Yes, we can keep this proposal open for discussion. That's for sure.
> > >
> > > I just don't have any good solution at this moment with a
> > multiple-workflow
> > > approach using Github Actions.
> > >
> > > An alternative is to look into Azure Pipeline, which the Flink
> community
> > is
> > > using.
> > > We are still learning there. Will post thoughts here once we have a
> > better
> > > idea.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <La...@hotari.net> wrote:
> > >
> > > > Thanks for the feedback, Sijie.
> > > >
> > > > > If this proposal is blocked by the other proposal, we should focus
> on
> > > > getting the changes for the other proposal before talking about
> merging
> > > > them.
> > > >
> > > > Yes, the current proposal depends on the draft PIP for "Changes to
> > flaky
> > > > test handling". I'll follow up on fixing the flaky test in a new
> email
> > > > thread.
> > > >
> > > > I hope we could get the discussions going on both draft PIPs and find
> > > > consensus together as a community.
> > > > During the discussions, more solution options will come up. Each
> > solution
> > > > has trade offs.
> > > > It would be useful to document the options when the community doesn't
> > > > immediately agree on a single choice.
> > > > I was thinking that these options could be documented in the same
> draft
> > > PIP
> > > > documents.
> > > >
> > > > I can give multiple authors editing access to the Google Docs so that
> > we
> > > > can keep on editing a single document for both draft PIPs.
> > > > Anyone who would want to add more solution options to the documents,
> > > please
> > > > let me know so that I'll add editing access.
> > > >
> > > > Sijie, would you like to document the option around keeping the
> > workflow
> > > as
> > > > multiple smaller workflows?
> > > > I have understood that the problems that have come up with the Pulsar
> > CI
> > > > regarding resource consumption would have to be resolved in that
> > > > alternative as well.
> > > >
> > > > I believe that everyone is open to any set of solution alternatives
> > which
> > > > solves the problems that we have with Pulsar CI.
> > > > We all know that it's urgent to fix Pulsar CI asap. We can do it
> > > together.
> > > >
> > > > BR, Lari
> > > >
> > > >
> > > > On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <gu...@gmail.com>
> wrote:
> > > >
> > > > > Lari,
> > > > >
> > > > > Thank you for bringing this proposal up! This is a great
> initiative.
> > > > >
> > > > > However, I agreed with Yong. We have spent tons of effort splitting
> > one
> > > > > large workflow into multiple smaller workflows.
> > > > >
> > > > > If this proposal is blocked by the other proposal, we should focus
> on
> > > > > getting the changes for the other proposal before talking about
> > merging
> > > > > them.
> > > > >
> > > > > Thanks,
> > > > > Sijie
> > > > >
> > > > > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <La...@hotari.net>
> wrote:
> > > > >
> > > > > > Thank you for the comments Penghui.
> > > > > >
> > > > > > Exactly what you said, we should make the tests stable.
> > > > > > The proposals in the other draft PIP "Changes to flaky test
> > handling"
> > > > > deals
> > > > > > with that.
> > > > > > It's currently a draft and needs more eyes. Would you be able to
> > > take a
> > > > > > closer look at that too?
> > > > > >
> > > > > > BR, Lari
> > > > > >
> > > > > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <
> > codelipenghui@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Currently, especially for the integration tests, a lot time to
> > > build
> > > > > > > pulsar distributions and docker images.
> > > > > > > I think before merge tests we should to make the test stable,
> > > > otherwise
> > > > > > > rerun the test will become more expensive.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Penghui
> > > > > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang <
> > > > > zhangyong1025.zy@gmail.com
> > > > > > >,
> > > > > > > wrote:
> > > > > > > > I am not sure that merge all the workflows into one workflow
> > is a
> > > > > good
> > > > > > > > idea. As
> > > > > > > > I know, Github Actions doesn't allow to rerun a single job
> in a
> > > > > > workflow.
> > > > > > > > That means
> > > > > > > > if there has any failure in the workflow, we need to rerun
> all
> > > > > > > > steps/stage. There has
> > > > > > > > a worst-case is we failed in the different tests when
> rerunning
> > > it
> > > > > and
> > > > > > > this
> > > > > > > > would take
> > > > > > > > more time to pass the CI.
> > > > > > > >
> > > > > > > > ---
> > > > > > > > Yong
> > > > > > > >
> > > > > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <
> > lari.hotari@sagire.fi
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Dear Pulsar community members,
> > > > > > > > >
> > > > > > > > > Currently, the Pulsar GitHub Actions workflows are
> consuming
> > > the
> > > > > > > majority
> > > > > > > > > of the shared pool of resources allocated for
> > > github.com/apache
> > > > > > > projects.
> > > > > > > > > Other Apache projects have been impacted and there is a
> > demand
> > > to
> > > > > > > improve
> > > > > > > > > the Pulsar CI
> > > > > > > > > <
> > > > https://github.com/apache/pulsar/pull/9159#issuecomment-766915396
> > > > > >
> > > > > > > asap.
> > > > > > > > >
> > > > > > > > > In GitHub Actions Runners, the unit of resources is the
> time
> > > > that a
> > > > > > > Runner
> > > > > > > > > is occupied. I observed the workflow runs for handling a
> > single
> > > > > Pull
> > > > > > > > > Request (in my personal fork) and these were the running
> > > > durations:
> > > > > > > > > Workflow name Duration
> > > > > > > > > CI - Build - MacOS 0:17:23
> > > > > > > > > CI - Go Functions style check 0:02:38
> > > > > > > > > CI - Unit - Brokers - Other 0:15:40
> > > > > > > > > CI - Unit - Brokers - Client Impl 0:16:28
> > > > > > > > > CI - Misc 0:16:51
> > > > > > > > > CI - Unit - Proxy 0:14:23
> > > > > > > > > CI - Go Functions Tests 0:22:08
> > > > > > > > > CI - CPP, Python Tests 0:23:30
> > > > > > > > > CI - Unit 0:42:11
> > > > > > > > > CI - Integration - Sql 1:00:13
> > > > > > > > > CI - Integration - Tiered JCloud 1:00:18
> > > > > > > > > CI - Integration - Tiered FileSystem 1:00:13
> > > > > > > > > CI - Integration - Function State 1:00:12
> > > > > > > > > CI - Integration - Cli 1:10:22
> > > > > > > > > CI - Integration - Transaction 1:16:34
> > > > > > > > > CI - Integration - Process 1:11:23
> > > > > > > > > CI - Shade - Test 1:15:45
> > > > > > > > > CI - Unit - Brokers - Client Api 0:26:13
> > > > > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05
> > > > > > > > > CI - Integration - Standalone 0:45:29
> > > > > > > > > CI - Integration - Messaging 1:00:23
> > > > > > > > > CI - Integration - Thread 1:00:19
> > > > > > > > > CI - Integration - Backwards Compatibility 1:00:19
> > > > > > > > > CI - Integration - Schema 1:00:19
> > > > > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31
> > > > > > > > > TOTAL 19:36:50
> > > > > > > > >
> > > > > > > > > *In this case, the total resource consumption of GitHub
> > Actions
> > > > > > > Runners is
> > > > > > > > > 19 hours 36 minutes 50 seconds for a single pull request to
> > > > > > > apache/pulsar.*
> > > > > > > > >
> > > > > > > > > Since GitHub Actions Runner resource pool utilization is
> very
> > > > high,
> > > > > > > this
> > > > > > > > > leads to the build queue to grow and take a long time to
> > > process.
> > > > > > > > >
> > > > > > > > > I have been looking for ways to improve the Pulsar CI for
> the
> > > > last
> > > > > 3
> > > > > > > > > months. During this period I worked on a few experiments.
> The
> > > > > > learnings
> > > > > > > > > from the past experiments are documented at a high level in
> > the
> > > > > > > following
> > > > > > > > > draft PIP document.
> > > > > > > > >
> > > > > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar CI"
> > > > document
> > > > > > is
> > > > > > > a
> > > > > > > > > Google doc:*
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
> > > > > > > > >
> > > > > > > > > *Please participate* so that we get the plan adjusted based
> > on
> > > > the
> > > > > > > feedback
> > > > > > > > > asap. If there's already a similar effort ongoing, I hope
> we
> > > can
> > > > > join
> > > > > > > > > efforts.
> > > > > > > > >
> > > > > > > > > *Let's fix Pulsar CI!*
> > > > > > > > >
> > > > > > > > > BR, Lari
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
-Ali

Re: [Discuss] draft PIP for "Changes to GitHub Actions based Pulsar CI"

Posted by Lari Hotari <La...@hotari.net>.
Hi Sijie,

> Yes. I was doing the work there and hopefully will get there soon.

Is the plan to do the 2.8.0 release with the new repositories and
repository structure in place?

I'd be interested in understanding how releasing is intended to be carried
out with multiple repositories without running into circular dependency
issues.

Let's say that there's the pulsar repository and the pulsar-connectors
repository. How do you release these if they depend on each other (circular
dependency at the repository level)? Assuming that a git version tag
applies to all files in the complete repository and that the source code
download has to match the repository content, I don't see how a release
could happen if there are circular dependencies at the repository level.

I guess that the new repository https://github.com/apache/pulsar-release
could be part of the solution. Is this documented already somewhere? What
is the role of "pulsar-release" repository?

Another concern I have had is about CI for the multiple repositories. There
are dependencies between the repositories. If an upstream library changes,
do we intend to run the tests in the downstream components to ensure that
the interfaces don't break?

Is the documentation for PIP-62 up-to-date? (PIP-62 link:
https://github.com/apache/pulsar/wiki/PIP-62%3A-Move-connectors%2C-adapters-and-Pulsar-Presto-to-separate-repositories
)

Please let me know how I can help with the solution. What are the next
steps? Perhaps we could share the tasks and make faster progress together?

BR, Lari

On Mon, Feb 1, 2021 at 11:17 PM Sijie Guo <gu...@gmail.com> wrote:

> Yes. I was doing the work there and hopefully will get there soon.
>
> - Sijie
>
> On Mon, Feb 1, 2021 at 12:40 PM Ali Ahmed <ah...@gmail.com> wrote:
>
> > I recommend we move the connectors away for the pulsar repo to reduce the
> > load on the main ci pipeline. The new repo seems ready.
> > https://github.com/apache/pulsar-connectors.
> >
> > -Ali
> >
> > On Fri, Jan 29, 2021 at 9:22 AM Sijie Guo <gu...@gmail.com> wrote:
> >
> > > Currently, Github Actions are shared across one large `apache`
> > > organization. It is the main problem for GA-based CI besides flaky
> tests.
> > >
> > > If we use Azure Pipeline, we can have a dedicated project for the
> pulsar.
> > > So we will have more resources to run.
> > > It will solve the problem that this proposal tries to solve. The
> approach
> > > has been used by Flink. We have started some experiments. We will share
> > > some of results here next week.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Fri, Jan 29, 2021 at 8:34 AM Lari Hotari <La...@hotari.net> wrote:
> > >
> > > > Hi Sijie,
> > > >
> > > > Let's keep this work going since resolving the problems with Pulsar
> CI
> > > are
> > > > urgent.
> > > >
> > > > I took a quick glance on the Azure Pipelines solution in Flink. By
> > > Googling
> > > > I found
> > > https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines
> > > > .
> > > > In the repository I found
> > > > https://github.com/apache/flink/blob/master/azure-pipelines.yml
> which
> > > > references
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/master/tools/azure-pipelines/jobs-template.yml
> > > >
> > > > It uses the build matrix feature to parallelize the execution:
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/dd0ee24e55dab4ae76201103c76495bc4fa0f73b/tools/azure-pipelines/jobs-template.yml#L88-L107
> > > >
> > > > What would be the key benefit for Pulsar CI of using Azure Pipelines
> > over
> > > > GitHub Actions?
> > > >
> > > > -Lari
> > > >
> > > > On Fri, Jan 29, 2021 at 6:03 PM Sijie Guo <gu...@gmail.com>
> wrote:
> > > >
> > > > > Lari,
> > > > >
> > > > > Yes, we can keep this proposal open for discussion. That's for
> sure.
> > > > >
> > > > > I just don't have any good solution at this moment with a
> > > > multiple-workflow
> > > > > approach using Github Actions.
> > > > >
> > > > > An alternative is to look into Azure Pipeline, which the Flink
> > > community
> > > > is
> > > > > using.
> > > > > We are still learning there. Will post thoughts here once we have a
> > > > better
> > > > > idea.
> > > > >
> > > > > Thanks,
> > > > > Sijie
> > > > >
> > > > > On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <La...@hotari.net>
> wrote:
> > > > >
> > > > > > Thanks for the feedback, Sijie.
> > > > > >
> > > > > > > If this proposal is blocked by the other proposal, we should
> > focus
> > > on
> > > > > > getting the changes for the other proposal before talking about
> > > merging
> > > > > > them.
> > > > > >
> > > > > > Yes, the current proposal depends on the draft PIP for "Changes
> to
> > > > flaky
> > > > > > test handling". I'll follow up on fixing the flaky test in a new
> > > email
> > > > > > thread.
> > > > > >
> > > > > > I hope we could get the discussions going on both draft PIPs and
> > find
> > > > > > consensus together as a community.
> > > > > > During the discussions, more solution options will come up. Each
> > > > solution
> > > > > > has trade offs.
> > > > > > It would be useful to document the options when the community
> > doesn't
> > > > > > immediately agree on a single choice.
> > > > > > I was thinking that these options could be documented in the same
> > > draft
> > > > > PIP
> > > > > > documents.
> > > > > >
> > > > > > I can give multiple authors editing access to the Google Docs so
> > that
> > > > we
> > > > > > can keep on editing a single document for both draft PIPs.
> > > > > > Anyone who would want to add more solution options to the
> > documents,
> > > > > please
> > > > > > let me know so that I'll add editing access.
> > > > > >
> > > > > > Sijie, would you like to document the option around keeping the
> > > > workflow
> > > > > as
> > > > > > multiple smaller workflows?
> > > > > > I have understood that the problems that have come up with the
> > Pulsar
> > > > CI
> > > > > > regarding resource consumption would have to be resolved in that
> > > > > > alternative as well.
> > > > > >
> > > > > > I believe that everyone is open to any set of solution
> alternatives
> > > > which
> > > > > > solves the problems that we have with Pulsar CI.
> > > > > > We all know that it's urgent to fix Pulsar CI asap. We can do it
> > > > > together.
> > > > > >
> > > > > > BR, Lari
> > > > > >
> > > > > >
> > > > > > On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <gu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Lari,
> > > > > > >
> > > > > > > Thank you for bringing this proposal up! This is a great
> > > initiative.
> > > > > > >
> > > > > > > However, I agreed with Yong. We have spent tons of effort
> > splitting
> > > > one
> > > > > > > large workflow into multiple smaller workflows.
> > > > > > >
> > > > > > > If this proposal is blocked by the other proposal, we should
> > focus
> > > on
> > > > > > > getting the changes for the other proposal before talking about
> > > > merging
> > > > > > > them.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Sijie
> > > > > > >
> > > > > > > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <La...@hotari.net>
> > > wrote:
> > > > > > >
> > > > > > > > Thank you for the comments Penghui.
> > > > > > > >
> > > > > > > > Exactly what you said, we should make the tests stable.
> > > > > > > > The proposals in the other draft PIP "Changes to flaky test
> > > > handling"
> > > > > > > deals
> > > > > > > > with that.
> > > > > > > > It's currently a draft and needs more eyes. Would you be able
> > to
> > > > > take a
> > > > > > > > closer look at that too?
> > > > > > > >
> > > > > > > > BR, Lari
> > > > > > > >
> > > > > > > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <
> > > > codelipenghui@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Currently, especially for the integration tests, a lot time
> > to
> > > > > build
> > > > > > > > > pulsar distributions and docker images.
> > > > > > > > > I think before merge tests we should to make the test
> stable,
> > > > > > otherwise
> > > > > > > > > rerun the test will become more expensive.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Penghui
> > > > > > > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang <
> > > > > > > zhangyong1025.zy@gmail.com
> > > > > > > > >,
> > > > > > > > > wrote:
> > > > > > > > > > I am not sure that merge all the workflows into one
> > workflow
> > > > is a
> > > > > > > good
> > > > > > > > > > idea. As
> > > > > > > > > > I know, Github Actions doesn't allow to rerun a single
> job
> > > in a
> > > > > > > > workflow.
> > > > > > > > > > That means
> > > > > > > > > > if there has any failure in the workflow, we need to
> rerun
> > > all
> > > > > > > > > > steps/stage. There has
> > > > > > > > > > a worst-case is we failed in the different tests when
> > > rerunning
> > > > > it
> > > > > > > and
> > > > > > > > > this
> > > > > > > > > > would take
> > > > > > > > > > more time to pass the CI.
> > > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > > Yong
> > > > > > > > > >
> > > > > > > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <
> > > > lari.hotari@sagire.fi
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Dear Pulsar community members,
> > > > > > > > > > >
> > > > > > > > > > > Currently, the Pulsar GitHub Actions workflows are
> > > consuming
> > > > > the
> > > > > > > > > majority
> > > > > > > > > > > of the shared pool of resources allocated for
> > > > > github.com/apache
> > > > > > > > > projects.
> > > > > > > > > > > Other Apache projects have been impacted and there is a
> > > > demand
> > > > > to
> > > > > > > > > improve
> > > > > > > > > > > the Pulsar CI
> > > > > > > > > > > <
> > > > > >
> https://github.com/apache/pulsar/pull/9159#issuecomment-766915396
> > > > > > > >
> > > > > > > > > asap.
> > > > > > > > > > >
> > > > > > > > > > > In GitHub Actions Runners, the unit of resources is the
> > > time
> > > > > > that a
> > > > > > > > > Runner
> > > > > > > > > > > is occupied. I observed the workflow runs for handling
> a
> > > > single
> > > > > > > Pull
> > > > > > > > > > > Request (in my personal fork) and these were the
> running
> > > > > > durations:
> > > > > > > > > > > Workflow name Duration
> > > > > > > > > > > CI - Build - MacOS 0:17:23
> > > > > > > > > > > CI - Go Functions style check 0:02:38
> > > > > > > > > > > CI - Unit - Brokers - Other 0:15:40
> > > > > > > > > > > CI - Unit - Brokers - Client Impl 0:16:28
> > > > > > > > > > > CI - Misc 0:16:51
> > > > > > > > > > > CI - Unit - Proxy 0:14:23
> > > > > > > > > > > CI - Go Functions Tests 0:22:08
> > > > > > > > > > > CI - CPP, Python Tests 0:23:30
> > > > > > > > > > > CI - Unit 0:42:11
> > > > > > > > > > > CI - Integration - Sql 1:00:13
> > > > > > > > > > > CI - Integration - Tiered JCloud 1:00:18
> > > > > > > > > > > CI - Integration - Tiered FileSystem 1:00:13
> > > > > > > > > > > CI - Integration - Function State 1:00:12
> > > > > > > > > > > CI - Integration - Cli 1:10:22
> > > > > > > > > > > CI - Integration - Transaction 1:16:34
> > > > > > > > > > > CI - Integration - Process 1:11:23
> > > > > > > > > > > CI - Shade - Test 1:15:45
> > > > > > > > > > > CI - Unit - Brokers - Client Api 0:26:13
> > > > > > > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05
> > > > > > > > > > > CI - Integration - Standalone 0:45:29
> > > > > > > > > > > CI - Integration - Messaging 1:00:23
> > > > > > > > > > > CI - Integration - Thread 1:00:19
> > > > > > > > > > > CI - Integration - Backwards Compatibility 1:00:19
> > > > > > > > > > > CI - Integration - Schema 1:00:19
> > > > > > > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31
> > > > > > > > > > > TOTAL 19:36:50
> > > > > > > > > > >
> > > > > > > > > > > *In this case, the total resource consumption of GitHub
> > > > Actions
> > > > > > > > > Runners is
> > > > > > > > > > > 19 hours 36 minutes 50 seconds for a single pull
> request
> > to
> > > > > > > > > apache/pulsar.*
> > > > > > > > > > >
> > > > > > > > > > > Since GitHub Actions Runner resource pool utilization
> is
> > > very
> > > > > > high,
> > > > > > > > > this
> > > > > > > > > > > leads to the build queue to grow and take a long time
> to
> > > > > process.
> > > > > > > > > > >
> > > > > > > > > > > I have been looking for ways to improve the Pulsar CI
> for
> > > the
> > > > > > last
> > > > > > > 3
> > > > > > > > > > > months. During this period I worked on a few
> experiments.
> > > The
> > > > > > > > learnings
> > > > > > > > > > > from the past experiments are documented at a high
> level
> > in
> > > > the
> > > > > > > > > following
> > > > > > > > > > > draft PIP document.
> > > > > > > > > > >
> > > > > > > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar
> > CI"
> > > > > > document
> > > > > > > > is
> > > > > > > > > a
> > > > > > > > > > > Google doc:*
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
> > > > > > > > > > >
> > > > > > > > > > > *Please participate* so that we get the plan adjusted
> > based
> > > > on
> > > > > > the
> > > > > > > > > feedback
> > > > > > > > > > > asap. If there's already a similar effort ongoing, I
> hope
> > > we
> > > > > can
> > > > > > > join
> > > > > > > > > > > efforts.
> > > > > > > > > > >
> > > > > > > > > > > *Let's fix Pulsar CI!*
> > > > > > > > > > >
> > > > > > > > > > > BR, Lari
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > -Ali
> >
>

Re: [Discuss] draft PIP for "Changes to GitHub Actions based Pulsar CI"

Posted by Lari Hotari <La...@hotari.net>.
Thank you, Ali. This sounds good!

BR, Lari

On Mon, Feb 1, 2021 at 11:47 PM Ali Ahmed <ah...@gmail.com> wrote:

>  We will need some simple connectors without dependencies, to replace the
> existing ones for basic integration testing. I can write those.
>
>
> On Mon, Feb 1, 2021 at 1:17 PM Sijie Guo <gu...@gmail.com> wrote:
>
> > Yes. I was doing the work there and hopefully will get there soon.
> >
> > - Sijie
> >
> > On Mon, Feb 1, 2021 at 12:40 PM Ali Ahmed <ah...@gmail.com> wrote:
> >
> > > I recommend we move the connectors away for the pulsar repo to reduce
> the
> > > load on the main ci pipeline. The new repo seems ready.
> > > https://github.com/apache/pulsar-connectors.
> > >
> > > -Ali
> > >
> > > On Fri, Jan 29, 2021 at 9:22 AM Sijie Guo <gu...@gmail.com> wrote:
> > >
> > > > Currently, Github Actions are shared across one large `apache`
> > > > organization. It is the main problem for GA-based CI besides flaky
> > tests.
> > > >
> > > > If we use Azure Pipeline, we can have a dedicated project for the
> > pulsar.
> > > > So we will have more resources to run.
> > > > It will solve the problem that this proposal tries to solve. The
> > approach
> > > > has been used by Flink. We have started some experiments. We will
> share
> > > > some of results here next week.
> > > >
> > > > Thanks,
> > > > Sijie
> > > >
> > > > On Fri, Jan 29, 2021 at 8:34 AM Lari Hotari <La...@hotari.net> wrote:
> > > >
> > > > > Hi Sijie,
> > > > >
> > > > > Let's keep this work going since resolving the problems with Pulsar
> > CI
> > > > are
> > > > > urgent.
> > > > >
> > > > > I took a quick glance on the Azure Pipelines solution in Flink. By
> > > > Googling
> > > > > I found
> > > > https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines
> > > > > .
> > > > > In the repository I found
> > > > > https://github.com/apache/flink/blob/master/azure-pipelines.yml
> > which
> > > > > references
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/master/tools/azure-pipelines/jobs-template.yml
> > > > >
> > > > > It uses the build matrix feature to parallelize the execution:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/dd0ee24e55dab4ae76201103c76495bc4fa0f73b/tools/azure-pipelines/jobs-template.yml#L88-L107
> > > > >
> > > > > What would be the key benefit for Pulsar CI of using Azure
> Pipelines
> > > over
> > > > > GitHub Actions?
> > > > >
> > > > > -Lari
> > > > >
> > > > > On Fri, Jan 29, 2021 at 6:03 PM Sijie Guo <gu...@gmail.com>
> > wrote:
> > > > >
> > > > > > Lari,
> > > > > >
> > > > > > Yes, we can keep this proposal open for discussion. That's for
> > sure.
> > > > > >
> > > > > > I just don't have any good solution at this moment with a
> > > > > multiple-workflow
> > > > > > approach using Github Actions.
> > > > > >
> > > > > > An alternative is to look into Azure Pipeline, which the Flink
> > > > community
> > > > > is
> > > > > > using.
> > > > > > We are still learning there. Will post thoughts here once we
> have a
> > > > > better
> > > > > > idea.
> > > > > >
> > > > > > Thanks,
> > > > > > Sijie
> > > > > >
> > > > > > On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <La...@hotari.net>
> > wrote:
> > > > > >
> > > > > > > Thanks for the feedback, Sijie.
> > > > > > >
> > > > > > > > If this proposal is blocked by the other proposal, we should
> > > focus
> > > > on
> > > > > > > getting the changes for the other proposal before talking about
> > > > merging
> > > > > > > them.
> > > > > > >
> > > > > > > Yes, the current proposal depends on the draft PIP for "Changes
> > to
> > > > > flaky
> > > > > > > test handling". I'll follow up on fixing the flaky test in a
> new
> > > > email
> > > > > > > thread.
> > > > > > >
> > > > > > > I hope we could get the discussions going on both draft PIPs
> and
> > > find
> > > > > > > consensus together as a community.
> > > > > > > During the discussions, more solution options will come up.
> Each
> > > > > solution
> > > > > > > has trade offs.
> > > > > > > It would be useful to document the options when the community
> > > doesn't
> > > > > > > immediately agree on a single choice.
> > > > > > > I was thinking that these options could be documented in the
> same
> > > > draft
> > > > > > PIP
> > > > > > > documents.
> > > > > > >
> > > > > > > I can give multiple authors editing access to the Google Docs
> so
> > > that
> > > > > we
> > > > > > > can keep on editing a single document for both draft PIPs.
> > > > > > > Anyone who would want to add more solution options to the
> > > documents,
> > > > > > please
> > > > > > > let me know so that I'll add editing access.
> > > > > > >
> > > > > > > Sijie, would you like to document the option around keeping the
> > > > > workflow
> > > > > > as
> > > > > > > multiple smaller workflows?
> > > > > > > I have understood that the problems that have come up with the
> > > Pulsar
> > > > > CI
> > > > > > > regarding resource consumption would have to be resolved in
> that
> > > > > > > alternative as well.
> > > > > > >
> > > > > > > I believe that everyone is open to any set of solution
> > alternatives
> > > > > which
> > > > > > > solves the problems that we have with Pulsar CI.
> > > > > > > We all know that it's urgent to fix Pulsar CI asap. We can do
> it
> > > > > > together.
> > > > > > >
> > > > > > > BR, Lari
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <guosijie@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Lari,
> > > > > > > >
> > > > > > > > Thank you for bringing this proposal up! This is a great
> > > > initiative.
> > > > > > > >
> > > > > > > > However, I agreed with Yong. We have spent tons of effort
> > > splitting
> > > > > one
> > > > > > > > large workflow into multiple smaller workflows.
> > > > > > > >
> > > > > > > > If this proposal is blocked by the other proposal, we should
> > > focus
> > > > on
> > > > > > > > getting the changes for the other proposal before talking
> about
> > > > > merging
> > > > > > > > them.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Sijie
> > > > > > > >
> > > > > > > > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <Lari@hotari.net
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Thank you for the comments Penghui.
> > > > > > > > >
> > > > > > > > > Exactly what you said, we should make the tests stable.
> > > > > > > > > The proposals in the other draft PIP "Changes to flaky test
> > > > > handling"
> > > > > > > > deals
> > > > > > > > > with that.
> > > > > > > > > It's currently a draft and needs more eyes. Would you be
> able
> > > to
> > > > > > take a
> > > > > > > > > closer look at that too?
> > > > > > > > >
> > > > > > > > > BR, Lari
> > > > > > > > >
> > > > > > > > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <
> > > > > codelipenghui@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Currently, especially for the integration tests, a lot
> time
> > > to
> > > > > > build
> > > > > > > > > > pulsar distributions and docker images.
> > > > > > > > > > I think before merge tests we should to make the test
> > stable,
> > > > > > > otherwise
> > > > > > > > > > rerun the test will become more expensive.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Penghui
> > > > > > > > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang <
> > > > > > > > zhangyong1025.zy@gmail.com
> > > > > > > > > >,
> > > > > > > > > > wrote:
> > > > > > > > > > > I am not sure that merge all the workflows into one
> > > workflow
> > > > > is a
> > > > > > > > good
> > > > > > > > > > > idea. As
> > > > > > > > > > > I know, Github Actions doesn't allow to rerun a single
> > job
> > > > in a
> > > > > > > > > workflow.
> > > > > > > > > > > That means
> > > > > > > > > > > if there has any failure in the workflow, we need to
> > rerun
> > > > all
> > > > > > > > > > > steps/stage. There has
> > > > > > > > > > > a worst-case is we failed in the different tests when
> > > > rerunning
> > > > > > it
> > > > > > > > and
> > > > > > > > > > this
> > > > > > > > > > > would take
> > > > > > > > > > > more time to pass the CI.
> > > > > > > > > > >
> > > > > > > > > > > ---
> > > > > > > > > > > Yong
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <
> > > > > lari.hotari@sagire.fi
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Dear Pulsar community members,
> > > > > > > > > > > >
> > > > > > > > > > > > Currently, the Pulsar GitHub Actions workflows are
> > > > consuming
> > > > > > the
> > > > > > > > > > majority
> > > > > > > > > > > > of the shared pool of resources allocated for
> > > > > > github.com/apache
> > > > > > > > > > projects.
> > > > > > > > > > > > Other Apache projects have been impacted and there
> is a
> > > > > demand
> > > > > > to
> > > > > > > > > > improve
> > > > > > > > > > > > the Pulsar CI
> > > > > > > > > > > > <
> > > > > > >
> > https://github.com/apache/pulsar/pull/9159#issuecomment-766915396
> > > > > > > > >
> > > > > > > > > > asap.
> > > > > > > > > > > >
> > > > > > > > > > > > In GitHub Actions Runners, the unit of resources is
> the
> > > > time
> > > > > > > that a
> > > > > > > > > > Runner
> > > > > > > > > > > > is occupied. I observed the workflow runs for
> handling
> > a
> > > > > single
> > > > > > > > Pull
> > > > > > > > > > > > Request (in my personal fork) and these were the
> > running
> > > > > > > durations:
> > > > > > > > > > > > Workflow name Duration
> > > > > > > > > > > > CI - Build - MacOS 0:17:23
> > > > > > > > > > > > CI - Go Functions style check 0:02:38
> > > > > > > > > > > > CI - Unit - Brokers - Other 0:15:40
> > > > > > > > > > > > CI - Unit - Brokers - Client Impl 0:16:28
> > > > > > > > > > > > CI - Misc 0:16:51
> > > > > > > > > > > > CI - Unit - Proxy 0:14:23
> > > > > > > > > > > > CI - Go Functions Tests 0:22:08
> > > > > > > > > > > > CI - CPP, Python Tests 0:23:30
> > > > > > > > > > > > CI - Unit 0:42:11
> > > > > > > > > > > > CI - Integration - Sql 1:00:13
> > > > > > > > > > > > CI - Integration - Tiered JCloud 1:00:18
> > > > > > > > > > > > CI - Integration - Tiered FileSystem 1:00:13
> > > > > > > > > > > > CI - Integration - Function State 1:00:12
> > > > > > > > > > > > CI - Integration - Cli 1:10:22
> > > > > > > > > > > > CI - Integration - Transaction 1:16:34
> > > > > > > > > > > > CI - Integration - Process 1:11:23
> > > > > > > > > > > > CI - Shade - Test 1:15:45
> > > > > > > > > > > > CI - Unit - Brokers - Client Api 0:26:13
> > > > > > > > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05
> > > > > > > > > > > > CI - Integration - Standalone 0:45:29
> > > > > > > > > > > > CI - Integration - Messaging 1:00:23
> > > > > > > > > > > > CI - Integration - Thread 1:00:19
> > > > > > > > > > > > CI - Integration - Backwards Compatibility 1:00:19
> > > > > > > > > > > > CI - Integration - Schema 1:00:19
> > > > > > > > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31
> > > > > > > > > > > > TOTAL 19:36:50
> > > > > > > > > > > >
> > > > > > > > > > > > *In this case, the total resource consumption of
> GitHub
> > > > > Actions
> > > > > > > > > > Runners is
> > > > > > > > > > > > 19 hours 36 minutes 50 seconds for a single pull
> > request
> > > to
> > > > > > > > > > apache/pulsar.*
> > > > > > > > > > > >
> > > > > > > > > > > > Since GitHub Actions Runner resource pool utilization
> > is
> > > > very
> > > > > > > high,
> > > > > > > > > > this
> > > > > > > > > > > > leads to the build queue to grow and take a long time
> > to
> > > > > > process.
> > > > > > > > > > > >
> > > > > > > > > > > > I have been looking for ways to improve the Pulsar CI
> > for
> > > > the
> > > > > > > last
> > > > > > > > 3
> > > > > > > > > > > > months. During this period I worked on a few
> > experiments.
> > > > The
> > > > > > > > > learnings
> > > > > > > > > > > > from the past experiments are documented at a high
> > level
> > > in
> > > > > the
> > > > > > > > > > following
> > > > > > > > > > > > draft PIP document.
> > > > > > > > > > > >
> > > > > > > > > > > > *The draft PIP "Changes to GitHub Actions based
> Pulsar
> > > CI"
> > > > > > > document
> > > > > > > > > is
> > > > > > > > > > a
> > > > > > > > > > > > Google doc:*
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
> > > > > > > > > > > >
> > > > > > > > > > > > *Please participate* so that we get the plan adjusted
> > > based
> > > > > on
> > > > > > > the
> > > > > > > > > > feedback
> > > > > > > > > > > > asap. If there's already a similar effort ongoing, I
> > hope
> > > > we
> > > > > > can
> > > > > > > > join
> > > > > > > > > > > > efforts.
> > > > > > > > > > > >
> > > > > > > > > > > > *Let's fix Pulsar CI!*
> > > > > > > > > > > >
> > > > > > > > > > > > BR, Lari
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > -Ali
> > >
> >
>
>
> --
> -Ali
>

Re: [Discuss] draft PIP for "Changes to GitHub Actions based Pulsar CI"

Posted by Ali Ahmed <ah...@gmail.com>.
 We will need some simple connectors without dependencies, to replace the
existing ones for basic integration testing. I can write those.


On Mon, Feb 1, 2021 at 1:17 PM Sijie Guo <gu...@gmail.com> wrote:

> Yes. I was doing the work there and hopefully will get there soon.
>
> - Sijie
>
> On Mon, Feb 1, 2021 at 12:40 PM Ali Ahmed <ah...@gmail.com> wrote:
>
> > I recommend we move the connectors away for the pulsar repo to reduce the
> > load on the main ci pipeline. The new repo seems ready.
> > https://github.com/apache/pulsar-connectors.
> >
> > -Ali
> >
> > On Fri, Jan 29, 2021 at 9:22 AM Sijie Guo <gu...@gmail.com> wrote:
> >
> > > Currently, Github Actions are shared across one large `apache`
> > > organization. It is the main problem for GA-based CI besides flaky
> tests.
> > >
> > > If we use Azure Pipeline, we can have a dedicated project for the
> pulsar.
> > > So we will have more resources to run.
> > > It will solve the problem that this proposal tries to solve. The
> approach
> > > has been used by Flink. We have started some experiments. We will share
> > > some of results here next week.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Fri, Jan 29, 2021 at 8:34 AM Lari Hotari <La...@hotari.net> wrote:
> > >
> > > > Hi Sijie,
> > > >
> > > > Let's keep this work going since resolving the problems with Pulsar
> CI
> > > are
> > > > urgent.
> > > >
> > > > I took a quick glance on the Azure Pipelines solution in Flink. By
> > > Googling
> > > > I found
> > > https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines
> > > > .
> > > > In the repository I found
> > > > https://github.com/apache/flink/blob/master/azure-pipelines.yml
> which
> > > > references
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/master/tools/azure-pipelines/jobs-template.yml
> > > >
> > > > It uses the build matrix feature to parallelize the execution:
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/dd0ee24e55dab4ae76201103c76495bc4fa0f73b/tools/azure-pipelines/jobs-template.yml#L88-L107
> > > >
> > > > What would be the key benefit for Pulsar CI of using Azure Pipelines
> > over
> > > > GitHub Actions?
> > > >
> > > > -Lari
> > > >
> > > > On Fri, Jan 29, 2021 at 6:03 PM Sijie Guo <gu...@gmail.com>
> wrote:
> > > >
> > > > > Lari,
> > > > >
> > > > > Yes, we can keep this proposal open for discussion. That's for
> sure.
> > > > >
> > > > > I just don't have any good solution at this moment with a
> > > > multiple-workflow
> > > > > approach using Github Actions.
> > > > >
> > > > > An alternative is to look into Azure Pipeline, which the Flink
> > > community
> > > > is
> > > > > using.
> > > > > We are still learning there. Will post thoughts here once we have a
> > > > better
> > > > > idea.
> > > > >
> > > > > Thanks,
> > > > > Sijie
> > > > >
> > > > > On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <La...@hotari.net>
> wrote:
> > > > >
> > > > > > Thanks for the feedback, Sijie.
> > > > > >
> > > > > > > If this proposal is blocked by the other proposal, we should
> > focus
> > > on
> > > > > > getting the changes for the other proposal before talking about
> > > merging
> > > > > > them.
> > > > > >
> > > > > > Yes, the current proposal depends on the draft PIP for "Changes
> to
> > > > flaky
> > > > > > test handling". I'll follow up on fixing the flaky test in a new
> > > email
> > > > > > thread.
> > > > > >
> > > > > > I hope we could get the discussions going on both draft PIPs and
> > find
> > > > > > consensus together as a community.
> > > > > > During the discussions, more solution options will come up. Each
> > > > solution
> > > > > > has trade offs.
> > > > > > It would be useful to document the options when the community
> > doesn't
> > > > > > immediately agree on a single choice.
> > > > > > I was thinking that these options could be documented in the same
> > > draft
> > > > > PIP
> > > > > > documents.
> > > > > >
> > > > > > I can give multiple authors editing access to the Google Docs so
> > that
> > > > we
> > > > > > can keep on editing a single document for both draft PIPs.
> > > > > > Anyone who would want to add more solution options to the
> > documents,
> > > > > please
> > > > > > let me know so that I'll add editing access.
> > > > > >
> > > > > > Sijie, would you like to document the option around keeping the
> > > > workflow
> > > > > as
> > > > > > multiple smaller workflows?
> > > > > > I have understood that the problems that have come up with the
> > Pulsar
> > > > CI
> > > > > > regarding resource consumption would have to be resolved in that
> > > > > > alternative as well.
> > > > > >
> > > > > > I believe that everyone is open to any set of solution
> alternatives
> > > > which
> > > > > > solves the problems that we have with Pulsar CI.
> > > > > > We all know that it's urgent to fix Pulsar CI asap. We can do it
> > > > > together.
> > > > > >
> > > > > > BR, Lari
> > > > > >
> > > > > >
> > > > > > On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <gu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Lari,
> > > > > > >
> > > > > > > Thank you for bringing this proposal up! This is a great
> > > initiative.
> > > > > > >
> > > > > > > However, I agreed with Yong. We have spent tons of effort
> > splitting
> > > > one
> > > > > > > large workflow into multiple smaller workflows.
> > > > > > >
> > > > > > > If this proposal is blocked by the other proposal, we should
> > focus
> > > on
> > > > > > > getting the changes for the other proposal before talking about
> > > > merging
> > > > > > > them.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Sijie
> > > > > > >
> > > > > > > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <La...@hotari.net>
> > > wrote:
> > > > > > >
> > > > > > > > Thank you for the comments Penghui.
> > > > > > > >
> > > > > > > > Exactly what you said, we should make the tests stable.
> > > > > > > > The proposals in the other draft PIP "Changes to flaky test
> > > > handling"
> > > > > > > deals
> > > > > > > > with that.
> > > > > > > > It's currently a draft and needs more eyes. Would you be able
> > to
> > > > > take a
> > > > > > > > closer look at that too?
> > > > > > > >
> > > > > > > > BR, Lari
> > > > > > > >
> > > > > > > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <
> > > > codelipenghui@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Currently, especially for the integration tests, a lot time
> > to
> > > > > build
> > > > > > > > > pulsar distributions and docker images.
> > > > > > > > > I think before merge tests we should to make the test
> stable,
> > > > > > otherwise
> > > > > > > > > rerun the test will become more expensive.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Penghui
> > > > > > > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang <
> > > > > > > zhangyong1025.zy@gmail.com
> > > > > > > > >,
> > > > > > > > > wrote:
> > > > > > > > > > I am not sure that merge all the workflows into one
> > workflow
> > > > is a
> > > > > > > good
> > > > > > > > > > idea. As
> > > > > > > > > > I know, Github Actions doesn't allow to rerun a single
> job
> > > in a
> > > > > > > > workflow.
> > > > > > > > > > That means
> > > > > > > > > > if there has any failure in the workflow, we need to
> rerun
> > > all
> > > > > > > > > > steps/stage. There has
> > > > > > > > > > a worst-case is we failed in the different tests when
> > > rerunning
> > > > > it
> > > > > > > and
> > > > > > > > > this
> > > > > > > > > > would take
> > > > > > > > > > more time to pass the CI.
> > > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > > Yong
> > > > > > > > > >
> > > > > > > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <
> > > > lari.hotari@sagire.fi
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Dear Pulsar community members,
> > > > > > > > > > >
> > > > > > > > > > > Currently, the Pulsar GitHub Actions workflows are
> > > consuming
> > > > > the
> > > > > > > > > majority
> > > > > > > > > > > of the shared pool of resources allocated for
> > > > > github.com/apache
> > > > > > > > > projects.
> > > > > > > > > > > Other Apache projects have been impacted and there is a
> > > > demand
> > > > > to
> > > > > > > > > improve
> > > > > > > > > > > the Pulsar CI
> > > > > > > > > > > <
> > > > > >
> https://github.com/apache/pulsar/pull/9159#issuecomment-766915396
> > > > > > > >
> > > > > > > > > asap.
> > > > > > > > > > >
> > > > > > > > > > > In GitHub Actions Runners, the unit of resources is the
> > > time
> > > > > > that a
> > > > > > > > > Runner
> > > > > > > > > > > is occupied. I observed the workflow runs for handling
> a
> > > > single
> > > > > > > Pull
> > > > > > > > > > > Request (in my personal fork) and these were the
> running
> > > > > > durations:
> > > > > > > > > > > Workflow name Duration
> > > > > > > > > > > CI - Build - MacOS 0:17:23
> > > > > > > > > > > CI - Go Functions style check 0:02:38
> > > > > > > > > > > CI - Unit - Brokers - Other 0:15:40
> > > > > > > > > > > CI - Unit - Brokers - Client Impl 0:16:28
> > > > > > > > > > > CI - Misc 0:16:51
> > > > > > > > > > > CI - Unit - Proxy 0:14:23
> > > > > > > > > > > CI - Go Functions Tests 0:22:08
> > > > > > > > > > > CI - CPP, Python Tests 0:23:30
> > > > > > > > > > > CI - Unit 0:42:11
> > > > > > > > > > > CI - Integration - Sql 1:00:13
> > > > > > > > > > > CI - Integration - Tiered JCloud 1:00:18
> > > > > > > > > > > CI - Integration - Tiered FileSystem 1:00:13
> > > > > > > > > > > CI - Integration - Function State 1:00:12
> > > > > > > > > > > CI - Integration - Cli 1:10:22
> > > > > > > > > > > CI - Integration - Transaction 1:16:34
> > > > > > > > > > > CI - Integration - Process 1:11:23
> > > > > > > > > > > CI - Shade - Test 1:15:45
> > > > > > > > > > > CI - Unit - Brokers - Client Api 0:26:13
> > > > > > > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05
> > > > > > > > > > > CI - Integration - Standalone 0:45:29
> > > > > > > > > > > CI - Integration - Messaging 1:00:23
> > > > > > > > > > > CI - Integration - Thread 1:00:19
> > > > > > > > > > > CI - Integration - Backwards Compatibility 1:00:19
> > > > > > > > > > > CI - Integration - Schema 1:00:19
> > > > > > > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31
> > > > > > > > > > > TOTAL 19:36:50
> > > > > > > > > > >
> > > > > > > > > > > *In this case, the total resource consumption of GitHub
> > > > Actions
> > > > > > > > > Runners is
> > > > > > > > > > > 19 hours 36 minutes 50 seconds for a single pull
> request
> > to
> > > > > > > > > apache/pulsar.*
> > > > > > > > > > >
> > > > > > > > > > > Since GitHub Actions Runner resource pool utilization
> is
> > > very
> > > > > > high,
> > > > > > > > > this
> > > > > > > > > > > leads to the build queue to grow and take a long time
> to
> > > > > process.
> > > > > > > > > > >
> > > > > > > > > > > I have been looking for ways to improve the Pulsar CI
> for
> > > the
> > > > > > last
> > > > > > > 3
> > > > > > > > > > > months. During this period I worked on a few
> experiments.
> > > The
> > > > > > > > learnings
> > > > > > > > > > > from the past experiments are documented at a high
> level
> > in
> > > > the
> > > > > > > > > following
> > > > > > > > > > > draft PIP document.
> > > > > > > > > > >
> > > > > > > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar
> > CI"
> > > > > > document
> > > > > > > > is
> > > > > > > > > a
> > > > > > > > > > > Google doc:*
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
> > > > > > > > > > >
> > > > > > > > > > > *Please participate* so that we get the plan adjusted
> > based
> > > > on
> > > > > > the
> > > > > > > > > feedback
> > > > > > > > > > > asap. If there's already a similar effort ongoing, I
> hope
> > > we
> > > > > can
> > > > > > > join
> > > > > > > > > > > efforts.
> > > > > > > > > > >
> > > > > > > > > > > *Let's fix Pulsar CI!*
> > > > > > > > > > >
> > > > > > > > > > > BR, Lari
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > -Ali
> >
>


-- 
-Ali

Re: [Discuss] draft PIP for "Changes to GitHub Actions based Pulsar CI"

Posted by Sijie Guo <gu...@gmail.com>.
Yes. I was doing the work there and hopefully will get there soon.

- Sijie

On Mon, Feb 1, 2021 at 12:40 PM Ali Ahmed <ah...@gmail.com> wrote:

> I recommend we move the connectors away for the pulsar repo to reduce the
> load on the main ci pipeline. The new repo seems ready.
> https://github.com/apache/pulsar-connectors.
>
> -Ali
>
> On Fri, Jan 29, 2021 at 9:22 AM Sijie Guo <gu...@gmail.com> wrote:
>
> > Currently, Github Actions are shared across one large `apache`
> > organization. It is the main problem for GA-based CI besides flaky tests.
> >
> > If we use Azure Pipeline, we can have a dedicated project for the pulsar.
> > So we will have more resources to run.
> > It will solve the problem that this proposal tries to solve. The approach
> > has been used by Flink. We have started some experiments. We will share
> > some of results here next week.
> >
> > Thanks,
> > Sijie
> >
> > On Fri, Jan 29, 2021 at 8:34 AM Lari Hotari <La...@hotari.net> wrote:
> >
> > > Hi Sijie,
> > >
> > > Let's keep this work going since resolving the problems with Pulsar CI
> > are
> > > urgent.
> > >
> > > I took a quick glance on the Azure Pipelines solution in Flink. By
> > Googling
> > > I found
> > https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines
> > > .
> > > In the repository I found
> > > https://github.com/apache/flink/blob/master/azure-pipelines.yml which
> > > references
> > >
> > >
> >
> https://github.com/apache/flink/blob/master/tools/azure-pipelines/jobs-template.yml
> > >
> > > It uses the build matrix feature to parallelize the execution:
> > >
> > >
> >
> https://github.com/apache/flink/blob/dd0ee24e55dab4ae76201103c76495bc4fa0f73b/tools/azure-pipelines/jobs-template.yml#L88-L107
> > >
> > > What would be the key benefit for Pulsar CI of using Azure Pipelines
> over
> > > GitHub Actions?
> > >
> > > -Lari
> > >
> > > On Fri, Jan 29, 2021 at 6:03 PM Sijie Guo <gu...@gmail.com> wrote:
> > >
> > > > Lari,
> > > >
> > > > Yes, we can keep this proposal open for discussion. That's for sure.
> > > >
> > > > I just don't have any good solution at this moment with a
> > > multiple-workflow
> > > > approach using Github Actions.
> > > >
> > > > An alternative is to look into Azure Pipeline, which the Flink
> > community
> > > is
> > > > using.
> > > > We are still learning there. Will post thoughts here once we have a
> > > better
> > > > idea.
> > > >
> > > > Thanks,
> > > > Sijie
> > > >
> > > > On Fri, Jan 29, 2021 at 5:07 AM Lari Hotari <La...@hotari.net> wrote:
> > > >
> > > > > Thanks for the feedback, Sijie.
> > > > >
> > > > > > If this proposal is blocked by the other proposal, we should
> focus
> > on
> > > > > getting the changes for the other proposal before talking about
> > merging
> > > > > them.
> > > > >
> > > > > Yes, the current proposal depends on the draft PIP for "Changes to
> > > flaky
> > > > > test handling". I'll follow up on fixing the flaky test in a new
> > email
> > > > > thread.
> > > > >
> > > > > I hope we could get the discussions going on both draft PIPs and
> find
> > > > > consensus together as a community.
> > > > > During the discussions, more solution options will come up. Each
> > > solution
> > > > > has trade offs.
> > > > > It would be useful to document the options when the community
> doesn't
> > > > > immediately agree on a single choice.
> > > > > I was thinking that these options could be documented in the same
> > draft
> > > > PIP
> > > > > documents.
> > > > >
> > > > > I can give multiple authors editing access to the Google Docs so
> that
> > > we
> > > > > can keep on editing a single document for both draft PIPs.
> > > > > Anyone who would want to add more solution options to the
> documents,
> > > > please
> > > > > let me know so that I'll add editing access.
> > > > >
> > > > > Sijie, would you like to document the option around keeping the
> > > workflow
> > > > as
> > > > > multiple smaller workflows?
> > > > > I have understood that the problems that have come up with the
> Pulsar
> > > CI
> > > > > regarding resource consumption would have to be resolved in that
> > > > > alternative as well.
> > > > >
> > > > > I believe that everyone is open to any set of solution alternatives
> > > which
> > > > > solves the problems that we have with Pulsar CI.
> > > > > We all know that it's urgent to fix Pulsar CI asap. We can do it
> > > > together.
> > > > >
> > > > > BR, Lari
> > > > >
> > > > >
> > > > > On Fri, Jan 29, 2021 at 11:51 AM Sijie Guo <gu...@gmail.com>
> > wrote:
> > > > >
> > > > > > Lari,
> > > > > >
> > > > > > Thank you for bringing this proposal up! This is a great
> > initiative.
> > > > > >
> > > > > > However, I agreed with Yong. We have spent tons of effort
> splitting
> > > one
> > > > > > large workflow into multiple smaller workflows.
> > > > > >
> > > > > > If this proposal is blocked by the other proposal, we should
> focus
> > on
> > > > > > getting the changes for the other proposal before talking about
> > > merging
> > > > > > them.
> > > > > >
> > > > > > Thanks,
> > > > > > Sijie
> > > > > >
> > > > > > On Thu, Jan 28, 2021 at 9:55 PM Lari Hotari <La...@hotari.net>
> > wrote:
> > > > > >
> > > > > > > Thank you for the comments Penghui.
> > > > > > >
> > > > > > > Exactly what you said, we should make the tests stable.
> > > > > > > The proposals in the other draft PIP "Changes to flaky test
> > > handling"
> > > > > > deals
> > > > > > > with that.
> > > > > > > It's currently a draft and needs more eyes. Would you be able
> to
> > > > take a
> > > > > > > closer look at that too?
> > > > > > >
> > > > > > > BR, Lari
> > > > > > >
> > > > > > > On Fri, Jan 29, 2021 at 6:41 AM PengHui Li <
> > > codelipenghui@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Currently, especially for the integration tests, a lot time
> to
> > > > build
> > > > > > > > pulsar distributions and docker images.
> > > > > > > > I think before merge tests we should to make the test stable,
> > > > > otherwise
> > > > > > > > rerun the test will become more expensive.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Penghui
> > > > > > > > On Jan 29, 2021, 11:55 AM +0800, Yong Zhang <
> > > > > > zhangyong1025.zy@gmail.com
> > > > > > > >,
> > > > > > > > wrote:
> > > > > > > > > I am not sure that merge all the workflows into one
> workflow
> > > is a
> > > > > > good
> > > > > > > > > idea. As
> > > > > > > > > I know, Github Actions doesn't allow to rerun a single job
> > in a
> > > > > > > workflow.
> > > > > > > > > That means
> > > > > > > > > if there has any failure in the workflow, we need to rerun
> > all
> > > > > > > > > steps/stage. There has
> > > > > > > > > a worst-case is we failed in the different tests when
> > rerunning
> > > > it
> > > > > > and
> > > > > > > > this
> > > > > > > > > would take
> > > > > > > > > more time to pass the CI.
> > > > > > > > >
> > > > > > > > > ---
> > > > > > > > > Yong
> > > > > > > > >
> > > > > > > > > On Fri, 29 Jan 2021 at 01:14, Lari Hotari <
> > > lari.hotari@sagire.fi
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Dear Pulsar community members,
> > > > > > > > > >
> > > > > > > > > > Currently, the Pulsar GitHub Actions workflows are
> > consuming
> > > > the
> > > > > > > > majority
> > > > > > > > > > of the shared pool of resources allocated for
> > > > github.com/apache
> > > > > > > > projects.
> > > > > > > > > > Other Apache projects have been impacted and there is a
> > > demand
> > > > to
> > > > > > > > improve
> > > > > > > > > > the Pulsar CI
> > > > > > > > > > <
> > > > > https://github.com/apache/pulsar/pull/9159#issuecomment-766915396
> > > > > > >
> > > > > > > > asap.
> > > > > > > > > >
> > > > > > > > > > In GitHub Actions Runners, the unit of resources is the
> > time
> > > > > that a
> > > > > > > > Runner
> > > > > > > > > > is occupied. I observed the workflow runs for handling a
> > > single
> > > > > > Pull
> > > > > > > > > > Request (in my personal fork) and these were the running
> > > > > durations:
> > > > > > > > > > Workflow name Duration
> > > > > > > > > > CI - Build - MacOS 0:17:23
> > > > > > > > > > CI - Go Functions style check 0:02:38
> > > > > > > > > > CI - Unit - Brokers - Other 0:15:40
> > > > > > > > > > CI - Unit - Brokers - Client Impl 0:16:28
> > > > > > > > > > CI - Misc 0:16:51
> > > > > > > > > > CI - Unit - Proxy 0:14:23
> > > > > > > > > > CI - Go Functions Tests 0:22:08
> > > > > > > > > > CI - CPP, Python Tests 0:23:30
> > > > > > > > > > CI - Unit 0:42:11
> > > > > > > > > > CI - Integration - Sql 1:00:13
> > > > > > > > > > CI - Integration - Tiered JCloud 1:00:18
> > > > > > > > > > CI - Integration - Tiered FileSystem 1:00:13
> > > > > > > > > > CI - Integration - Function State 1:00:12
> > > > > > > > > > CI - Integration - Cli 1:10:22
> > > > > > > > > > CI - Integration - Transaction 1:16:34
> > > > > > > > > > CI - Integration - Process 1:11:23
> > > > > > > > > > CI - Shade - Test 1:15:45
> > > > > > > > > > CI - Unit - Brokers - Client Api 0:26:13
> > > > > > > > > > CI - Unit - Brokers - Broker Group 2 0:35:05
> > > > > > > > > > CI - Integration - Standalone 0:45:29
> > > > > > > > > > CI - Integration - Messaging 1:00:23
> > > > > > > > > > CI - Integration - Thread 1:00:19
> > > > > > > > > > CI - Integration - Backwards Compatibility 1:00:19
> > > > > > > > > > CI - Integration - Schema 1:00:19
> > > > > > > > > > CI - Unit - Brokers - Broker Group 1 2:02:31
> > > > > > > > > > TOTAL 19:36:50
> > > > > > > > > >
> > > > > > > > > > *In this case, the total resource consumption of GitHub
> > > Actions
> > > > > > > > Runners is
> > > > > > > > > > 19 hours 36 minutes 50 seconds for a single pull request
> to
> > > > > > > > apache/pulsar.*
> > > > > > > > > >
> > > > > > > > > > Since GitHub Actions Runner resource pool utilization is
> > very
> > > > > high,
> > > > > > > > this
> > > > > > > > > > leads to the build queue to grow and take a long time to
> > > > process.
> > > > > > > > > >
> > > > > > > > > > I have been looking for ways to improve the Pulsar CI for
> > the
> > > > > last
> > > > > > 3
> > > > > > > > > > months. During this period I worked on a few experiments.
> > The
> > > > > > > learnings
> > > > > > > > > > from the past experiments are documented at a high level
> in
> > > the
> > > > > > > > following
> > > > > > > > > > draft PIP document.
> > > > > > > > > >
> > > > > > > > > > *The draft PIP "Changes to GitHub Actions based Pulsar
> CI"
> > > > > document
> > > > > > > is
> > > > > > > > a
> > > > > > > > > > Google doc:*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing
> > > > > > > > > >
> > > > > > > > > > *Please participate* so that we get the plan adjusted
> based
> > > on
> > > > > the
> > > > > > > > feedback
> > > > > > > > > > asap. If there's already a similar effort ongoing, I hope
> > we
> > > > can
> > > > > > join
> > > > > > > > > > efforts.
> > > > > > > > > >
> > > > > > > > > > *Let's fix Pulsar CI!*
> > > > > > > > > >
> > > > > > > > > > BR, Lari
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> -Ali
>