You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pulsar.apache.org by Lari Hotari <lh...@apache.org> on 2022/09/15 08:36:01 UTC

[CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Hi all,

The GitHub Actions based Pulsar CI has been experiencing issues for
multiple weeks. The condition is currently better, but the resource
shortage issue remains. CI builds will take a long time to complete even
after many optimizations have been made.

There's a long email thread with some details about the past issues:
https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf

I have filed an issue to GitHub support about the CI issues over a week
ago, and I finally received an answer a few hours ago. However the
GitHub support person didn't reply to my questions at all, but instead
suggested that there's a beta program where it's possible to pay for
more resources. That solution isn't suitable for our case, since it
doesn't seem to be possible to assign GitHub Actions Runner VM resources
only for a specific Apache project. I'll follow up with GitHub support, but
I don't expect that to resolve our problems in the near term. We need
to make changes in our CI resource consumption.

In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
suggested: "Apache Spark project requires that all PRs are executed in
the contributor's GHA quota. Maybe Pulsar can do the same ?!"

The Apache Spark contributing guide contains details about this in the
"Pull request" section, https://spark.apache.org/contributing.html .

"Before creating a pull request in Apache Spark, it is important to
check if tests can pass on your branch because our GitHub Actions
workflows automatically run tests for your pull request/following
commits and every run burdens the limited resources of GitHub Actions in
Apache Spark repository. "

In Pulsar, we will need to do the same. As a solution to this, Tison
suggested that we would not run all tests for the PR unless there's a
"ready-to-test" label on the PR.

I think this is a good suggestion. We could extend the existing
"pulsarbot" to help with the automation.

A reviewer could comment "/pulsarbot ready-to-test" on the PR and
pulsarbot would add the label and also restart the CI workflow to make
it proceed and run the tests.
pulsarbot would check for authorized users. One simple
approach would be to add a file ".pulsarci.yaml" in apache/pulsar
repository with the relevant information:

committer_github_ids:
- committer1
- committer2
...

ready_to_test:
authorized_github_ids:
- userid1
- userid2
...

We would have a script to synchronize all Pulsar committers to this file
peridiotically (manual step after there's a new committer). ASF provides
public json files for project members at
https://whimsy.apache.org/public/public_ldap_projects.json , however the
mapping to github user names seems to be missing. That could be done
with a custom script since ASF LDAP contains the github username.

All Pulsar committers would have access. In addition, there could be other
users that are authorized for using "/pulsarbot ready-to-test".

This solution would also require changes in the GitHub Actions workflows
so that the workflow is failed in an early step unless there's a
ready-to-test label for the PR.

With the above solution, we would be able to cut the amount of
unnecessary builds and get the excessive resource consumption issue
under control. The PR authors would be instructed to run initial PR
builds in their own fork and the reviewer should check that this is done
before approving the PR for testing with "/pulsarbot ready-to-test".

I would suggest proceeding quickly on this matter without separate PIPs
or votes. We could follow the Apache lazy consensus
(https://community.apache.org/committers/lazyConsensus.html) principle
and make this happen if there aren't objections in the next 72 hours.
The improvement suggestions to this proposal would obviously be taken
into account and if someone objects, we wouldn't have reached lazy
consensus and we wouldn't proceed.

-Lari

1 - https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Lari Hotari <lh...@apache.org>.

I have pushed a fix in PR https://github.com/apache/pulsar/pull/17723 , please review.

-Lari

On 2022/09/19 15:26:14 Lari Hotari wrote:
> Unfortunately a PR approval isn't currently detected by the solution
> Until this is fixed, the reviewer will have to add a "ready-to-test" label before adding a comment "/pulsarbot rerun-failure-checks" to the PR so that the PR can be eventually merged.
> I'm sorry about this inconvenience.
> 
> -Lari
> 
> On 2022/09/19 06:17:35 Lari Hotari wrote:
> > I'll now merge the changes to master. The contributor docs will be updated in a separate pull request. Current in progress pull requests will get the changes when they are updated.
> > 
> > -Lari
> > 
> > On 2022/09/16 12:43:44 Lari Hotari wrote:
> > > On 2022/09/16 10:09:51 PengHui Li wrote:
> > > > After I go through all the comments here.
> > > > Do we really need a new label?
> > > 
> > > Good suggestion. It was also suggested yesterday by Matteo that a PR approval should be sufficient. I have modified the solution in https://github.com/apache/pulsar/pull/17693 so that either a PR approval or the "ready-to-test" label is required for running tests in apache/pulsar.
> > > 
> > > I have retained the ready-to-test label solution since there might be cases where the reviewer might want to choose to run the tests in apache/pulsar repository before approving the PR. This is to ensure that we don't change the meaning of the PR approval to trigger tests to run.
> > > 
> > > In the first phase, I won't be adding "/pulsarbot ready-to-test" at all. Instead, we'd use use PR approval + "/pulsarbot rerun-failure-checks" to trigger the build pipeline after an approval. The label can be added manually in GitHub UI if that approach is used.
> > > 
> > > I hope we are ready to merge https://github.com/apache/pulsar/pull/17693 on Monday. I'm confident that everyone will be much happier with the revised CI where it's possible to get almost instant CI feedback without hours of delays that slows down PR processing and merging.
> > > 
> > > -Lari
> > > 
> > > On 2022/09/16 10:09:51 PengHui Li wrote:
> > > > Thanks, Lari
> > > > 
> > > > After I go through all the comments here.
> > > > Do we really need a new label?
> > > > 
> > > > It looks like if a committer thinks we should trigger the CI in the Pulsar
> > > > repo
> > > > 
> > > > - Passed in the fork repo
> > > > - No request change for this PR.
> > > > - ...
> > > > 
> > > > Just run the "/pulsarbot ready-to-test" or "/pulsarbot trigger-ci".
> > > > I think it makes sense that have a committer take a look at the PR first
> > > > and then trigger the CI, approval is not required.
> > > > 
> > > > During the PR review, the author could also push many commits to address
> > > > the comment.
> > > > After the comments have been addressed, we can trigger the CI again.
> > > > 
> > > > Maybe I missed something about the label approach.
> > > > 
> > > > Thanks,
> > > > Penghui
> > > > 
> > > > 
> > > > On Fri, Sep 16, 2022 at 5:45 PM Lari Hotari <lh...@apache.org> wrote:
> > > > 
> > > > > I have created a draft PR for making the changes in Pulsar CI:
> > > > > https://github.com/apache/pulsar/pull/17693
> > > > >
> > > > > I'm looking forward to further practical improvements. I'd like to remind
> > > > > everyone that we must make this change to address the CI slowness. After
> > > > > this change, the experience of Pulsar CI will improve for everyone.
> > > > >
> > > > > In addition to the above PR, the contributor guide and Pulsarbot changes
> > > > > will be needed. I expect that we would be able to complete the changes
> > > > > during next week.
> > > > >
> > > > > -Lari
> > > > >
> > > > > On 2022/09/15 08:36:01 Lari Hotari wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > The GitHub Actions based Pulsar CI has been experiencing issues for
> > > > > > multiple weeks. The condition is currently better, but the resource
> > > > > > shortage issue remains. CI builds will take a long time to complete even
> > > > > > after many optimizations have been made.
> > > > > >
> > > > > > There's a long email thread with some details about the past issues:
> > > > > > https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> > > > > >
> > > > > > I have filed an issue to GitHub support about the CI issues over a week
> > > > > > ago, and I finally received an answer a few hours ago. However the
> > > > > > GitHub support person didn't reply to my questions at all, but instead
> > > > > > suggested that there's a beta program where it's possible to pay for
> > > > > > more resources. That solution isn't suitable for our case, since it
> > > > > > doesn't seem to be possible to assign GitHub Actions Runner VM resources
> > > > > > only for a specific Apache project. I'll follow up with GitHub support,
> > > > > but
> > > > > > I don't expect that to resolve our problems in the near term. We need
> > > > > > to make changes in our CI resource consumption.
> > > > > >
> > > > > > In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> > > > > > suggested: "Apache Spark project requires that all PRs are executed in
> > > > > > the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> > > > > >
> > > > > > The Apache Spark contributing guide contains details about this in the
> > > > > > "Pull request" section, https://spark.apache.org/contributing.html .
> > > > > >
> > > > > > "Before creating a pull request in Apache Spark, it is important to
> > > > > > check if tests can pass on your branch because our GitHub Actions
> > > > > > workflows automatically run tests for your pull request/following
> > > > > > commits and every run burdens the limited resources of GitHub Actions in
> > > > > > Apache Spark repository. "
> > > > > >
> > > > > > In Pulsar, we will need to do the same. As a solution to this, Tison
> > > > > > suggested that we would not run all tests for the PR unless there's a
> > > > > > "ready-to-test" label on the PR.
> > > > > >
> > > > > > I think this is a good suggestion. We could extend the existing
> > > > > > "pulsarbot" to help with the automation.
> > > > > >
> > > > > > A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> > > > > > pulsarbot would add the label and also restart the CI workflow to make
> > > > > > it proceed and run the tests.
> > > > > > pulsarbot would check for authorized users. One simple
> > > > > > approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> > > > > > repository with the relevant information:
> > > > > >
> > > > > > committer_github_ids:
> > > > > >   - committer1
> > > > > >   - committer2
> > > > > >   ...
> > > > > >
> > > > > > ready_to_test:
> > > > > >   authorized_github_ids:
> > > > > >     - userid1
> > > > > >     - userid2
> > > > > >     ...
> > > > > >
> > > > > > We would have a script to synchronize all Pulsar committers to this file
> > > > > > peridiotically (manual step after there's a new committer). ASF provides
> > > > > > public json files for project members at
> > > > > > https://whimsy.apache.org/public/public_ldap_projects.json , however the
> > > > > > mapping to github user names seems to be missing. That could be done
> > > > > > with a custom script since ASF LDAP contains the github username.
> > > > > >
> > > > > > All Pulsar committers would have access. In addition, there could be
> > > > > other
> > > > > > users that are authorized for using "/pulsarbot ready-to-test".
> > > > > >
> > > > > > This solution would also require changes in the GitHub Actions workflows
> > > > > > so that the workflow is failed in an early step unless there's a
> > > > > > ready-to-test label for the PR.
> > > > > >
> > > > > > With the above solution, we would be able to cut the amount of
> > > > > > unnecessary builds and get the excessive resource consumption issue
> > > > > > under control. The PR authors would be instructed to run initial PR
> > > > > > builds in their own fork and the reviewer should check that this is done
> > > > > > before approving the PR for testing with "/pulsarbot ready-to-test".
> > > > > >
> > > > > > I would suggest proceeding quickly on this matter without separate PIPs
> > > > > > or votes. We could follow the Apache lazy consensus
> > > > > > (https://community.apache.org/committers/lazyConsensus.html) principle
> > > > > > and make this happen if there aren't objections in the next 72 hours.
> > > > > > The improvement suggestions to this proposal would obviously be taken
> > > > > > into account and if someone objects, we wouldn't have reached lazy
> > > > > > consensus and we wouldn't proceed.
> > > > > >
> > > > > > -Lari
> > > > > >
> > > > > >
> > > > > > 1 -
> > > > > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> > > > > >
> > > > >
> > > > 
> > > 
> > 
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Lari Hotari <lh...@apache.org>.

Unfortunately a PR approval isn't currently detected by the solution
Until this is fixed, the reviewer will have to add a "ready-to-test" label before adding a comment "/pulsarbot rerun-failure-checks" to the PR so that the PR can be eventually merged.
I'm sorry about this inconvenience.

-Lari

On 2022/09/19 06:17:35 Lari Hotari wrote:
> I'll now merge the changes to master. The contributor docs will be updated in a separate pull request. Current in progress pull requests will get the changes when they are updated.
> 
> -Lari
> 
> On 2022/09/16 12:43:44 Lari Hotari wrote:
> > On 2022/09/16 10:09:51 PengHui Li wrote:
> > > After I go through all the comments here.
> > > Do we really need a new label?
> > 
> > Good suggestion. It was also suggested yesterday by Matteo that a PR approval should be sufficient. I have modified the solution in https://github.com/apache/pulsar/pull/17693 so that either a PR approval or the "ready-to-test" label is required for running tests in apache/pulsar.
> > 
> > I have retained the ready-to-test label solution since there might be cases where the reviewer might want to choose to run the tests in apache/pulsar repository before approving the PR. This is to ensure that we don't change the meaning of the PR approval to trigger tests to run.
> > 
> > In the first phase, I won't be adding "/pulsarbot ready-to-test" at all. Instead, we'd use use PR approval + "/pulsarbot rerun-failure-checks" to trigger the build pipeline after an approval. The label can be added manually in GitHub UI if that approach is used.
> > 
> > I hope we are ready to merge https://github.com/apache/pulsar/pull/17693 on Monday. I'm confident that everyone will be much happier with the revised CI where it's possible to get almost instant CI feedback without hours of delays that slows down PR processing and merging.
> > 
> > -Lari
> > 
> > On 2022/09/16 10:09:51 PengHui Li wrote:
> > > Thanks, Lari
> > > 
> > > After I go through all the comments here.
> > > Do we really need a new label?
> > > 
> > > It looks like if a committer thinks we should trigger the CI in the Pulsar
> > > repo
> > > 
> > > - Passed in the fork repo
> > > - No request change for this PR.
> > > - ...
> > > 
> > > Just run the "/pulsarbot ready-to-test" or "/pulsarbot trigger-ci".
> > > I think it makes sense that have a committer take a look at the PR first
> > > and then trigger the CI, approval is not required.
> > > 
> > > During the PR review, the author could also push many commits to address
> > > the comment.
> > > After the comments have been addressed, we can trigger the CI again.
> > > 
> > > Maybe I missed something about the label approach.
> > > 
> > > Thanks,
> > > Penghui
> > > 
> > > 
> > > On Fri, Sep 16, 2022 at 5:45 PM Lari Hotari <lh...@apache.org> wrote:
> > > 
> > > > I have created a draft PR for making the changes in Pulsar CI:
> > > > https://github.com/apache/pulsar/pull/17693
> > > >
> > > > I'm looking forward to further practical improvements. I'd like to remind
> > > > everyone that we must make this change to address the CI slowness. After
> > > > this change, the experience of Pulsar CI will improve for everyone.
> > > >
> > > > In addition to the above PR, the contributor guide and Pulsarbot changes
> > > > will be needed. I expect that we would be able to complete the changes
> > > > during next week.
> > > >
> > > > -Lari
> > > >
> > > > On 2022/09/15 08:36:01 Lari Hotari wrote:
> > > > > Hi all,
> > > > >
> > > > > The GitHub Actions based Pulsar CI has been experiencing issues for
> > > > > multiple weeks. The condition is currently better, but the resource
> > > > > shortage issue remains. CI builds will take a long time to complete even
> > > > > after many optimizations have been made.
> > > > >
> > > > > There's a long email thread with some details about the past issues:
> > > > > https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> > > > >
> > > > > I have filed an issue to GitHub support about the CI issues over a week
> > > > > ago, and I finally received an answer a few hours ago. However the
> > > > > GitHub support person didn't reply to my questions at all, but instead
> > > > > suggested that there's a beta program where it's possible to pay for
> > > > > more resources. That solution isn't suitable for our case, since it
> > > > > doesn't seem to be possible to assign GitHub Actions Runner VM resources
> > > > > only for a specific Apache project. I'll follow up with GitHub support,
> > > > but
> > > > > I don't expect that to resolve our problems in the near term. We need
> > > > > to make changes in our CI resource consumption.
> > > > >
> > > > > In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> > > > > suggested: "Apache Spark project requires that all PRs are executed in
> > > > > the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> > > > >
> > > > > The Apache Spark contributing guide contains details about this in the
> > > > > "Pull request" section, https://spark.apache.org/contributing.html .
> > > > >
> > > > > "Before creating a pull request in Apache Spark, it is important to
> > > > > check if tests can pass on your branch because our GitHub Actions
> > > > > workflows automatically run tests for your pull request/following
> > > > > commits and every run burdens the limited resources of GitHub Actions in
> > > > > Apache Spark repository. "
> > > > >
> > > > > In Pulsar, we will need to do the same. As a solution to this, Tison
> > > > > suggested that we would not run all tests for the PR unless there's a
> > > > > "ready-to-test" label on the PR.
> > > > >
> > > > > I think this is a good suggestion. We could extend the existing
> > > > > "pulsarbot" to help with the automation.
> > > > >
> > > > > A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> > > > > pulsarbot would add the label and also restart the CI workflow to make
> > > > > it proceed and run the tests.
> > > > > pulsarbot would check for authorized users. One simple
> > > > > approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> > > > > repository with the relevant information:
> > > > >
> > > > > committer_github_ids:
> > > > >   - committer1
> > > > >   - committer2
> > > > >   ...
> > > > >
> > > > > ready_to_test:
> > > > >   authorized_github_ids:
> > > > >     - userid1
> > > > >     - userid2
> > > > >     ...
> > > > >
> > > > > We would have a script to synchronize all Pulsar committers to this file
> > > > > peridiotically (manual step after there's a new committer). ASF provides
> > > > > public json files for project members at
> > > > > https://whimsy.apache.org/public/public_ldap_projects.json , however the
> > > > > mapping to github user names seems to be missing. That could be done
> > > > > with a custom script since ASF LDAP contains the github username.
> > > > >
> > > > > All Pulsar committers would have access. In addition, there could be
> > > > other
> > > > > users that are authorized for using "/pulsarbot ready-to-test".
> > > > >
> > > > > This solution would also require changes in the GitHub Actions workflows
> > > > > so that the workflow is failed in an early step unless there's a
> > > > > ready-to-test label for the PR.
> > > > >
> > > > > With the above solution, we would be able to cut the amount of
> > > > > unnecessary builds and get the excessive resource consumption issue
> > > > > under control. The PR authors would be instructed to run initial PR
> > > > > builds in their own fork and the reviewer should check that this is done
> > > > > before approving the PR for testing with "/pulsarbot ready-to-test".
> > > > >
> > > > > I would suggest proceeding quickly on this matter without separate PIPs
> > > > > or votes. We could follow the Apache lazy consensus
> > > > > (https://community.apache.org/committers/lazyConsensus.html) principle
> > > > > and make this happen if there aren't objections in the next 72 hours.
> > > > > The improvement suggestions to this proposal would obviously be taken
> > > > > into account and if someone objects, we wouldn't have reached lazy
> > > > > consensus and we wouldn't proceed.
> > > > >
> > > > > -Lari
> > > > >
> > > > >
> > > > > 1 -
> > > > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> > > > >
> > > >
> > > 
> > 
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Lari Hotari <lh...@apache.org>.

I'll now merge the changes to master. The contributor docs will be updated in a separate pull request. Current in progress pull requests will get the changes when they are updated.

-Lari

On 2022/09/16 12:43:44 Lari Hotari wrote:
> On 2022/09/16 10:09:51 PengHui Li wrote:
> > After I go through all the comments here.
> > Do we really need a new label?
> 
> Good suggestion. It was also suggested yesterday by Matteo that a PR approval should be sufficient. I have modified the solution in https://github.com/apache/pulsar/pull/17693 so that either a PR approval or the "ready-to-test" label is required for running tests in apache/pulsar.
> 
> I have retained the ready-to-test label solution since there might be cases where the reviewer might want to choose to run the tests in apache/pulsar repository before approving the PR. This is to ensure that we don't change the meaning of the PR approval to trigger tests to run.
> 
> In the first phase, I won't be adding "/pulsarbot ready-to-test" at all. Instead, we'd use use PR approval + "/pulsarbot rerun-failure-checks" to trigger the build pipeline after an approval. The label can be added manually in GitHub UI if that approach is used.
> 
> I hope we are ready to merge https://github.com/apache/pulsar/pull/17693 on Monday. I'm confident that everyone will be much happier with the revised CI where it's possible to get almost instant CI feedback without hours of delays that slows down PR processing and merging.
> 
> -Lari
> 
> On 2022/09/16 10:09:51 PengHui Li wrote:
> > Thanks, Lari
> > 
> > After I go through all the comments here.
> > Do we really need a new label?
> > 
> > It looks like if a committer thinks we should trigger the CI in the Pulsar
> > repo
> > 
> > - Passed in the fork repo
> > - No request change for this PR.
> > - ...
> > 
> > Just run the "/pulsarbot ready-to-test" or "/pulsarbot trigger-ci".
> > I think it makes sense that have a committer take a look at the PR first
> > and then trigger the CI, approval is not required.
> > 
> > During the PR review, the author could also push many commits to address
> > the comment.
> > After the comments have been addressed, we can trigger the CI again.
> > 
> > Maybe I missed something about the label approach.
> > 
> > Thanks,
> > Penghui
> > 
> > 
> > On Fri, Sep 16, 2022 at 5:45 PM Lari Hotari <lh...@apache.org> wrote:
> > 
> > > I have created a draft PR for making the changes in Pulsar CI:
> > > https://github.com/apache/pulsar/pull/17693
> > >
> > > I'm looking forward to further practical improvements. I'd like to remind
> > > everyone that we must make this change to address the CI slowness. After
> > > this change, the experience of Pulsar CI will improve for everyone.
> > >
> > > In addition to the above PR, the contributor guide and Pulsarbot changes
> > > will be needed. I expect that we would be able to complete the changes
> > > during next week.
> > >
> > > -Lari
> > >
> > > On 2022/09/15 08:36:01 Lari Hotari wrote:
> > > > Hi all,
> > > >
> > > > The GitHub Actions based Pulsar CI has been experiencing issues for
> > > > multiple weeks. The condition is currently better, but the resource
> > > > shortage issue remains. CI builds will take a long time to complete even
> > > > after many optimizations have been made.
> > > >
> > > > There's a long email thread with some details about the past issues:
> > > > https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> > > >
> > > > I have filed an issue to GitHub support about the CI issues over a week
> > > > ago, and I finally received an answer a few hours ago. However the
> > > > GitHub support person didn't reply to my questions at all, but instead
> > > > suggested that there's a beta program where it's possible to pay for
> > > > more resources. That solution isn't suitable for our case, since it
> > > > doesn't seem to be possible to assign GitHub Actions Runner VM resources
> > > > only for a specific Apache project. I'll follow up with GitHub support,
> > > but
> > > > I don't expect that to resolve our problems in the near term. We need
> > > > to make changes in our CI resource consumption.
> > > >
> > > > In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> > > > suggested: "Apache Spark project requires that all PRs are executed in
> > > > the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> > > >
> > > > The Apache Spark contributing guide contains details about this in the
> > > > "Pull request" section, https://spark.apache.org/contributing.html .
> > > >
> > > > "Before creating a pull request in Apache Spark, it is important to
> > > > check if tests can pass on your branch because our GitHub Actions
> > > > workflows automatically run tests for your pull request/following
> > > > commits and every run burdens the limited resources of GitHub Actions in
> > > > Apache Spark repository. "
> > > >
> > > > In Pulsar, we will need to do the same. As a solution to this, Tison
> > > > suggested that we would not run all tests for the PR unless there's a
> > > > "ready-to-test" label on the PR.
> > > >
> > > > I think this is a good suggestion. We could extend the existing
> > > > "pulsarbot" to help with the automation.
> > > >
> > > > A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> > > > pulsarbot would add the label and also restart the CI workflow to make
> > > > it proceed and run the tests.
> > > > pulsarbot would check for authorized users. One simple
> > > > approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> > > > repository with the relevant information:
> > > >
> > > > committer_github_ids:
> > > >   - committer1
> > > >   - committer2
> > > >   ...
> > > >
> > > > ready_to_test:
> > > >   authorized_github_ids:
> > > >     - userid1
> > > >     - userid2
> > > >     ...
> > > >
> > > > We would have a script to synchronize all Pulsar committers to this file
> > > > peridiotically (manual step after there's a new committer). ASF provides
> > > > public json files for project members at
> > > > https://whimsy.apache.org/public/public_ldap_projects.json , however the
> > > > mapping to github user names seems to be missing. That could be done
> > > > with a custom script since ASF LDAP contains the github username.
> > > >
> > > > All Pulsar committers would have access. In addition, there could be
> > > other
> > > > users that are authorized for using "/pulsarbot ready-to-test".
> > > >
> > > > This solution would also require changes in the GitHub Actions workflows
> > > > so that the workflow is failed in an early step unless there's a
> > > > ready-to-test label for the PR.
> > > >
> > > > With the above solution, we would be able to cut the amount of
> > > > unnecessary builds and get the excessive resource consumption issue
> > > > under control. The PR authors would be instructed to run initial PR
> > > > builds in their own fork and the reviewer should check that this is done
> > > > before approving the PR for testing with "/pulsarbot ready-to-test".
> > > >
> > > > I would suggest proceeding quickly on this matter without separate PIPs
> > > > or votes. We could follow the Apache lazy consensus
> > > > (https://community.apache.org/committers/lazyConsensus.html) principle
> > > > and make this happen if there aren't objections in the next 72 hours.
> > > > The improvement suggestions to this proposal would obviously be taken
> > > > into account and if someone objects, we wouldn't have reached lazy
> > > > consensus and we wouldn't proceed.
> > > >
> > > > -Lari
> > > >
> > > >
> > > > 1 -
> > > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> > > >
> > >
> > 
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Lari Hotari <lh...@apache.org>.

On 2022/09/16 10:09:51 PengHui Li wrote:
> After I go through all the comments here.
> Do we really need a new label?

Good suggestion. It was also suggested yesterday by Matteo that a PR approval should be sufficient. I have modified the solution in https://github.com/apache/pulsar/pull/17693 so that either a PR approval or the "ready-to-test" label is required for running tests in apache/pulsar.

I have retained the ready-to-test label solution since there might be cases where the reviewer might want to choose to run the tests in apache/pulsar repository before approving the PR. This is to ensure that we don't change the meaning of the PR approval to trigger tests to run.

In the first phase, I won't be adding "/pulsarbot ready-to-test" at all. Instead, we'd use use PR approval + "/pulsarbot rerun-failure-checks" to trigger the build pipeline after an approval. The label can be added manually in GitHub UI if that approach is used.

I hope we are ready to merge https://github.com/apache/pulsar/pull/17693 on Monday. I'm confident that everyone will be much happier with the revised CI where it's possible to get almost instant CI feedback without hours of delays that slows down PR processing and merging.

-Lari

On 2022/09/16 10:09:51 PengHui Li wrote:
> Thanks, Lari
> 
> After I go through all the comments here.
> Do we really need a new label?
> 
> It looks like if a committer thinks we should trigger the CI in the Pulsar
> repo
> 
> - Passed in the fork repo
> - No request change for this PR.
> - ...
> 
> Just run the "/pulsarbot ready-to-test" or "/pulsarbot trigger-ci".
> I think it makes sense that have a committer take a look at the PR first
> and then trigger the CI, approval is not required.
> 
> During the PR review, the author could also push many commits to address
> the comment.
> After the comments have been addressed, we can trigger the CI again.
> 
> Maybe I missed something about the label approach.
> 
> Thanks,
> Penghui
> 
> 
> On Fri, Sep 16, 2022 at 5:45 PM Lari Hotari <lh...@apache.org> wrote:
> 
> > I have created a draft PR for making the changes in Pulsar CI:
> > https://github.com/apache/pulsar/pull/17693
> >
> > I'm looking forward to further practical improvements. I'd like to remind
> > everyone that we must make this change to address the CI slowness. After
> > this change, the experience of Pulsar CI will improve for everyone.
> >
> > In addition to the above PR, the contributor guide and Pulsarbot changes
> > will be needed. I expect that we would be able to complete the changes
> > during next week.
> >
> > -Lari
> >
> > On 2022/09/15 08:36:01 Lari Hotari wrote:
> > > Hi all,
> > >
> > > The GitHub Actions based Pulsar CI has been experiencing issues for
> > > multiple weeks. The condition is currently better, but the resource
> > > shortage issue remains. CI builds will take a long time to complete even
> > > after many optimizations have been made.
> > >
> > > There's a long email thread with some details about the past issues:
> > > https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> > >
> > > I have filed an issue to GitHub support about the CI issues over a week
> > > ago, and I finally received an answer a few hours ago. However the
> > > GitHub support person didn't reply to my questions at all, but instead
> > > suggested that there's a beta program where it's possible to pay for
> > > more resources. That solution isn't suitable for our case, since it
> > > doesn't seem to be possible to assign GitHub Actions Runner VM resources
> > > only for a specific Apache project. I'll follow up with GitHub support,
> > but
> > > I don't expect that to resolve our problems in the near term. We need
> > > to make changes in our CI resource consumption.
> > >
> > > In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> > > suggested: "Apache Spark project requires that all PRs are executed in
> > > the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> > >
> > > The Apache Spark contributing guide contains details about this in the
> > > "Pull request" section, https://spark.apache.org/contributing.html .
> > >
> > > "Before creating a pull request in Apache Spark, it is important to
> > > check if tests can pass on your branch because our GitHub Actions
> > > workflows automatically run tests for your pull request/following
> > > commits and every run burdens the limited resources of GitHub Actions in
> > > Apache Spark repository. "
> > >
> > > In Pulsar, we will need to do the same. As a solution to this, Tison
> > > suggested that we would not run all tests for the PR unless there's a
> > > "ready-to-test" label on the PR.
> > >
> > > I think this is a good suggestion. We could extend the existing
> > > "pulsarbot" to help with the automation.
> > >
> > > A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> > > pulsarbot would add the label and also restart the CI workflow to make
> > > it proceed and run the tests.
> > > pulsarbot would check for authorized users. One simple
> > > approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> > > repository with the relevant information:
> > >
> > > committer_github_ids:
> > >   - committer1
> > >   - committer2
> > >   ...
> > >
> > > ready_to_test:
> > >   authorized_github_ids:
> > >     - userid1
> > >     - userid2
> > >     ...
> > >
> > > We would have a script to synchronize all Pulsar committers to this file
> > > peridiotically (manual step after there's a new committer). ASF provides
> > > public json files for project members at
> > > https://whimsy.apache.org/public/public_ldap_projects.json , however the
> > > mapping to github user names seems to be missing. That could be done
> > > with a custom script since ASF LDAP contains the github username.
> > >
> > > All Pulsar committers would have access. In addition, there could be
> > other
> > > users that are authorized for using "/pulsarbot ready-to-test".
> > >
> > > This solution would also require changes in the GitHub Actions workflows
> > > so that the workflow is failed in an early step unless there's a
> > > ready-to-test label for the PR.
> > >
> > > With the above solution, we would be able to cut the amount of
> > > unnecessary builds and get the excessive resource consumption issue
> > > under control. The PR authors would be instructed to run initial PR
> > > builds in their own fork and the reviewer should check that this is done
> > > before approving the PR for testing with "/pulsarbot ready-to-test".
> > >
> > > I would suggest proceeding quickly on this matter without separate PIPs
> > > or votes. We could follow the Apache lazy consensus
> > > (https://community.apache.org/committers/lazyConsensus.html) principle
> > > and make this happen if there aren't objections in the next 72 hours.
> > > The improvement suggestions to this proposal would obviously be taken
> > > into account and if someone objects, we wouldn't have reached lazy
> > > consensus and we wouldn't proceed.
> > >
> > > -Lari
> > >
> > >
> > > 1 -
> > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> > >
> >
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by PengHui Li <pe...@apache.org>.

Thanks, Lari

After I go through all the comments here.
Do we really need a new label?

It looks like if a committer thinks we should trigger the CI in the Pulsar
repo

- Passed in the fork repo
- No request change for this PR.
- ...

Just run the "/pulsarbot ready-to-test" or "/pulsarbot trigger-ci".
I think it makes sense that have a committer take a look at the PR first
and then trigger the CI, approval is not required.

During the PR review, the author could also push many commits to address
the comment.
After the comments have been addressed, we can trigger the CI again.

Maybe I missed something about the label approach.

Thanks,
Penghui


On Fri, Sep 16, 2022 at 5:45 PM Lari Hotari <lh...@apache.org> wrote:

> I have created a draft PR for making the changes in Pulsar CI:
> https://github.com/apache/pulsar/pull/17693
>
> I'm looking forward to further practical improvements. I'd like to remind
> everyone that we must make this change to address the CI slowness. After
> this change, the experience of Pulsar CI will improve for everyone.
>
> In addition to the above PR, the contributor guide and Pulsarbot changes
> will be needed. I expect that we would be able to complete the changes
> during next week.
>
> -Lari
>
> On 2022/09/15 08:36:01 Lari Hotari wrote:
> > Hi all,
> >
> > The GitHub Actions based Pulsar CI has been experiencing issues for
> > multiple weeks. The condition is currently better, but the resource
> > shortage issue remains. CI builds will take a long time to complete even
> > after many optimizations have been made.
> >
> > There's a long email thread with some details about the past issues:
> > https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> >
> > I have filed an issue to GitHub support about the CI issues over a week
> > ago, and I finally received an answer a few hours ago. However the
> > GitHub support person didn't reply to my questions at all, but instead
> > suggested that there's a beta program where it's possible to pay for
> > more resources. That solution isn't suitable for our case, since it
> > doesn't seem to be possible to assign GitHub Actions Runner VM resources
> > only for a specific Apache project. I'll follow up with GitHub support,
> but
> > I don't expect that to resolve our problems in the near term. We need
> > to make changes in our CI resource consumption.
> >
> > In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> > suggested: "Apache Spark project requires that all PRs are executed in
> > the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> >
> > The Apache Spark contributing guide contains details about this in the
> > "Pull request" section, https://spark.apache.org/contributing.html .
> >
> > "Before creating a pull request in Apache Spark, it is important to
> > check if tests can pass on your branch because our GitHub Actions
> > workflows automatically run tests for your pull request/following
> > commits and every run burdens the limited resources of GitHub Actions in
> > Apache Spark repository. "
> >
> > In Pulsar, we will need to do the same. As a solution to this, Tison
> > suggested that we would not run all tests for the PR unless there's a
> > "ready-to-test" label on the PR.
> >
> > I think this is a good suggestion. We could extend the existing
> > "pulsarbot" to help with the automation.
> >
> > A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> > pulsarbot would add the label and also restart the CI workflow to make
> > it proceed and run the tests.
> > pulsarbot would check for authorized users. One simple
> > approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> > repository with the relevant information:
> >
> > committer_github_ids:
> >   - committer1
> >   - committer2
> >   ...
> >
> > ready_to_test:
> >   authorized_github_ids:
> >     - userid1
> >     - userid2
> >     ...
> >
> > We would have a script to synchronize all Pulsar committers to this file
> > peridiotically (manual step after there's a new committer). ASF provides
> > public json files for project members at
> > https://whimsy.apache.org/public/public_ldap_projects.json , however the
> > mapping to github user names seems to be missing. That could be done
> > with a custom script since ASF LDAP contains the github username.
> >
> > All Pulsar committers would have access. In addition, there could be
> other
> > users that are authorized for using "/pulsarbot ready-to-test".
> >
> > This solution would also require changes in the GitHub Actions workflows
> > so that the workflow is failed in an early step unless there's a
> > ready-to-test label for the PR.
> >
> > With the above solution, we would be able to cut the amount of
> > unnecessary builds and get the excessive resource consumption issue
> > under control. The PR authors would be instructed to run initial PR
> > builds in their own fork and the reviewer should check that this is done
> > before approving the PR for testing with "/pulsarbot ready-to-test".
> >
> > I would suggest proceeding quickly on this matter without separate PIPs
> > or votes. We could follow the Apache lazy consensus
> > (https://community.apache.org/committers/lazyConsensus.html) principle
> > and make this happen if there aren't objections in the next 72 hours.
> > The improvement suggestions to this proposal would obviously be taken
> > into account and if someone objects, we wouldn't have reached lazy
> > consensus and we wouldn't proceed.
> >
> > -Lari
> >
> >
> > 1 -
> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> >
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Lari Hotari <lh...@apache.org>.

I have created a draft PR for making the changes in Pulsar CI:
https://github.com/apache/pulsar/pull/17693

I'm looking forward to further practical improvements. I'd like to remind everyone that we must make this change to address the CI slowness. After this change, the experience of Pulsar CI will improve for everyone. 

In addition to the above PR, the contributor guide and Pulsarbot changes will be needed. I expect that we would be able to complete the changes during next week.

-Lari

On 2022/09/15 08:36:01 Lari Hotari wrote:
> Hi all,
> 
> The GitHub Actions based Pulsar CI has been experiencing issues for
> multiple weeks. The condition is currently better, but the resource
> shortage issue remains. CI builds will take a long time to complete even
> after many optimizations have been made.
> 
> There's a long email thread with some details about the past issues:
> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> 
> I have filed an issue to GitHub support about the CI issues over a week
> ago, and I finally received an answer a few hours ago. However the
> GitHub support person didn't reply to my questions at all, but instead
> suggested that there's a beta program where it's possible to pay for
> more resources. That solution isn't suitable for our case, since it
> doesn't seem to be possible to assign GitHub Actions Runner VM resources
> only for a specific Apache project. I'll follow up with GitHub support, but
> I don't expect that to resolve our problems in the near term. We need
> to make changes in our CI resource consumption.
> 
> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> suggested: "Apache Spark project requires that all PRs are executed in
> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> 
> The Apache Spark contributing guide contains details about this in the
> "Pull request" section, https://spark.apache.org/contributing.html .
> 
> "Before creating a pull request in Apache Spark, it is important to
> check if tests can pass on your branch because our GitHub Actions
> workflows automatically run tests for your pull request/following
> commits and every run burdens the limited resources of GitHub Actions in
> Apache Spark repository. "
> 
> In Pulsar, we will need to do the same. As a solution to this, Tison
> suggested that we would not run all tests for the PR unless there's a
> "ready-to-test" label on the PR.
> 
> I think this is a good suggestion. We could extend the existing
> "pulsarbot" to help with the automation.
> 
> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> pulsarbot would add the label and also restart the CI workflow to make
> it proceed and run the tests.
> pulsarbot would check for authorized users. One simple
> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> repository with the relevant information:
> 
> committer_github_ids:
>   - committer1
>   - committer2
>   ...
> 
> ready_to_test:
>   authorized_github_ids:
>     - userid1
>     - userid2
>     ...
> 
> We would have a script to synchronize all Pulsar committers to this file
> peridiotically (manual step after there's a new committer). ASF provides
> public json files for project members at
> https://whimsy.apache.org/public/public_ldap_projects.json , however the
> mapping to github user names seems to be missing. That could be done
> with a custom script since ASF LDAP contains the github username.
> 
> All Pulsar committers would have access. In addition, there could be other
> users that are authorized for using "/pulsarbot ready-to-test".
> 
> This solution would also require changes in the GitHub Actions workflows
> so that the workflow is failed in an early step unless there's a
> ready-to-test label for the PR.
> 
> With the above solution, we would be able to cut the amount of
> unnecessary builds and get the excessive resource consumption issue
> under control. The PR authors would be instructed to run initial PR
> builds in their own fork and the reviewer should check that this is done
> before approving the PR for testing with "/pulsarbot ready-to-test".
> 
> I would suggest proceeding quickly on this matter without separate PIPs
> or votes. We could follow the Apache lazy consensus
> (https://community.apache.org/committers/lazyConsensus.html) principle
> and make this happen if there aren't objections in the next 72 hours.
> The improvement suggestions to this proposal would obviously be taken
> into account and if someone objects, we wouldn't have reached lazy
> consensus and we wouldn't proceed.
> 
> -Lari
> 
> 
> 1 - https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Yunze Xu <yz...@streamnative.io.INVALID>.

Good explanation! I agree.

Thanks,
Yunze




> On Sep 15, 2022, at 19:42, Lari Hotari <lh...@apache.org> wrote:
> 
>> I’m a little confused about what will CI do in this case? I think the “ready-to-test” label should
>> be removed in this case because the new code might not pass the tests. I thought the author
>> should request committers to add this label again after the tests passed in his own repo.
> 
> This is possible. I'd rather postpone it since it would require more
> effort to implement and perhaps just be an unnecessary burden.
> 
> The reviewer shouldn't be doing "/pulsarbot ready-to-test" for PRs
> that aren't ready.
> The PR author can continue to iterate the test runs in the fork until
> PR comments have been addressed.
> It's technically possible to have the same branch in PRs in
> apache/pulsar and your own fork. When the branch gets updated, the
> builds in the forked repository would run. That's why I think it's
> better to keep on iterating on the PR until it's really ready for
> final
> testing in apache/pulsar repository.
> 
> -Lari
> 
> On Thu, Sep 15, 2022 at 2:27 PM Yunze Xu <yz...@streamnative.io.invalid> wrote:
>> 
>>> If there are later changes in the PR after the "ready-to-test" label has been added, we could simply let the Pulsar CI handle the builds.
>> 
>> I’m a little confused about what will CI do in this case? I think the “ready-to-test” label should
>> be removed in this case because the new code might not pass the tests. I thought the author
>> should request committers to add this label again after the tests passed in his own repo.
>> 
>> Thanks,
>> Yunze
>> 
>> 
>> 
>> 
>>> On Sep 15, 2022, at 19:20, Lari Hotari <lh...@apache.org> wrote:
>>> 
>>>> In short, IIUC, each contributor should:
>>>> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to
>>>> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache repo
>>>> 
>>>> Then a committer should run `/pulsarbot ready-to-test` after the PR in
>>>> contributor's private repo passed all tests, right?
>>> 
>>> Exactly. One small detail: It should be the PR author's responsibility to follow up and request for a review and an approval after the tests pass.
>>> If there are later changes in the PR after the "ready-to-test" label has been added, we could simply let the Pulsar CI handle the builds.
>>> 
>>> -Lari
>>> 
>>> On 2022/09/15 10:11:53 Yunze Xu wrote:
>>>> Hi Lari,
>>>> 
>>>> This proposal LGTM. But I have some questions about the details.
>>>> 
>>>> In short, IIUC, each contributor should:
>>>> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to
>>>> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache repo
>>>> 
>>>> Then a committer should run `/pulsarbot ready-to-test` after the PR in
>>>> contributor's private repo passed all tests, right?
>>>> 
>>>> Thanks,
>>>> Yunze
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Sep 15, 2022, at 17:54, Lari Hotari <lh...@apache.org> wrote:
>>>>> 
>>>>> Thanks for the comment.
>>>>> 
>>>>> The question isn't about trusting PRs.
>>>>> The CI resource consumption problem is also caused by current committer PRs. That's why it
>>>>> is necessary to handle all PRs in the same way.
>>>>> The benefit of the proposed solution is that we could decide to run some light checks automatically before the step that requires the "ready-to-check" label.
>>>>> 
>>>>> After I sent the proposal, I found out that the Pulsar committer information including the GitHub
>>>>> user names is available in JSON format at https://whimsy.apache.org/roster/committee/pulsar.json .
>>>>> Pulsarbot can use this information for authorizing users who have access to
>>>>> "/pulsarbot ready-to-test".
>>>>> 
>>>>> I agree that we can skip adding a separate reviewer role, let's
>>>>> simply use https://whimsy.apache.org/roster/committee/pulsar.json as the source of truth
>>>>> for authorization.
>>>>> 
>>>>> -Lari
>>>>> 
>>>>> On 2022/09/15 09:22:18 tison wrote:
>>>>>> Hi Lari,
>>>>>> 
>>>>>> Thanks for starting this discussion. The overall proposal looks good and
>>>>>> it's really great that you can spend some time on such a significant
>>>>>> infrastructure.
>>>>>> 
>>>>>> One comment here is that we can start with all "authorized" users to
>>>>>> trigger the CI in the committer group instead of introducing a new concept
>>>>>> "reviewer" - it will be another topic to discuss and I generally prefer
>>>>>> more committership to encourage participation instead of a complicated
>>>>>> membership structure.
>>>>>> 
>>>>>> Besides, a quick fixup for reducing traffic is setting "Fork pull request
>>>>>> workflows from outside collaborators" option[1] as "Require approval for
>>>>>> all outside collaborators". This is provided out-of-the-box by GitHub and
>>>>>> requires NO development[2]. Although it doesn't restrict people who are
>>>>>> already apache org members but are not Pulsar committers, I believe the
>>>>>> trust level is acceptable. An INFRA team member will be asked to perform
>>>>>> the settings change if we want this.
>>>>>> 
>>>>>> Best,
>>>>>> tison.
>>>>>> 
>>>>>> [1] https://github.com/apache/pulsar/settings/actions
>>>>>> [2]
>>>>>> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
>>>>>> 
>>>>>> 
>>>>>> Lari Hotari <lh...@apache.org> 于2022年9月15日周四 16:36写道：
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> The GitHub Actions based Pulsar CI has been experiencing issues for
>>>>>>> multiple weeks. The condition is currently better, but the resource
>>>>>>> shortage issue remains. CI builds will take a long time to complete even
>>>>>>> after many optimizations have been made.
>>>>>>> 
>>>>>>> There's a long email thread with some details about the past issues:
>>>>>>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
>>>>>>> 
>>>>>>> I have filed an issue to GitHub support about the CI issues over a week
>>>>>>> ago, and I finally received an answer a few hours ago. However the
>>>>>>> GitHub support person didn't reply to my questions at all, but instead
>>>>>>> suggested that there's a beta program where it's possible to pay for
>>>>>>> more resources. That solution isn't suitable for our case, since it
>>>>>>> doesn't seem to be possible to assign GitHub Actions Runner VM resources
>>>>>>> only for a specific Apache project. I'll follow up with GitHub support, but
>>>>>>> I don't expect that to resolve our problems in the near term. We need
>>>>>>> to make changes in our CI resource consumption.
>>>>>>> 
>>>>>>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
>>>>>>> suggested: "Apache Spark project requires that all PRs are executed in
>>>>>>> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
>>>>>>> 
>>>>>>> The Apache Spark contributing guide contains details about this in the
>>>>>>> "Pull request" section, https://spark.apache.org/contributing.html .
>>>>>>> 
>>>>>>> "Before creating a pull request in Apache Spark, it is important to
>>>>>>> check if tests can pass on your branch because our GitHub Actions
>>>>>>> workflows automatically run tests for your pull request/following
>>>>>>> commits and every run burdens the limited resources of GitHub Actions in
>>>>>>> Apache Spark repository. "
>>>>>>> 
>>>>>>> In Pulsar, we will need to do the same. As a solution to this, Tison
>>>>>>> suggested that we would not run all tests for the PR unless there's a
>>>>>>> "ready-to-test" label on the PR.
>>>>>>> 
>>>>>>> I think this is a good suggestion. We could extend the existing
>>>>>>> "pulsarbot" to help with the automation.
>>>>>>> 
>>>>>>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
>>>>>>> pulsarbot would add the label and also restart the CI workflow to make
>>>>>>> it proceed and run the tests.
>>>>>>> pulsarbot would check for authorized users. One simple
>>>>>>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
>>>>>>> repository with the relevant information:
>>>>>>> 
>>>>>>> committer_github_ids:
>>>>>>> - committer1
>>>>>>> - committer2
>>>>>>> ...
>>>>>>> 
>>>>>>> ready_to_test:
>>>>>>> authorized_github_ids:
>>>>>>>  - userid1
>>>>>>>  - userid2
>>>>>>>  ...
>>>>>>> 
>>>>>>> We would have a script to synchronize all Pulsar committers to this file
>>>>>>> peridiotically (manual step after there's a new committer). ASF provides
>>>>>>> public json files for project members at
>>>>>>> https://whimsy.apache.org/public/public_ldap_projects.json , however the
>>>>>>> mapping to github user names seems to be missing. That could be done
>>>>>>> with a custom script since ASF LDAP contains the github username.
>>>>>>> 
>>>>>>> All Pulsar committers would have access. In addition, there could be other
>>>>>>> users that are authorized for using "/pulsarbot ready-to-test".
>>>>>>> 
>>>>>>> This solution would also require changes in the GitHub Actions workflows
>>>>>>> so that the workflow is failed in an early step unless there's a
>>>>>>> ready-to-test label for the PR.
>>>>>>> 
>>>>>>> With the above solution, we would be able to cut the amount of
>>>>>>> unnecessary builds and get the excessive resource consumption issue
>>>>>>> under control. The PR authors would be instructed to run initial PR
>>>>>>> builds in their own fork and the reviewer should check that this is done
>>>>>>> before approving the PR for testing with "/pulsarbot ready-to-test".
>>>>>>> 
>>>>>>> I would suggest proceeding quickly on this matter without separate PIPs
>>>>>>> or votes. We could follow the Apache lazy consensus
>>>>>>> (https://community.apache.org/committers/lazyConsensus.html) principle
>>>>>>> and make this happen if there aren't objections in the next 72 hours.
>>>>>>> The improvement suggestions to this proposal would obviously be taken
>>>>>>> into account and if someone objects, we wouldn't have reached lazy
>>>>>>> consensus and we wouldn't proceed.
>>>>>>> 
>>>>>>> -Lari
>>>>>>> 
>>>>>>> 
>>>>>>> 1 -
>>>>>>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Lari Hotari <lh...@apache.org>.

> I’m a little confused about what will CI do in this case? I think the “ready-to-test” label should
> be removed in this case because the new code might not pass the tests. I thought the author
> should request committers to add this label again after the tests passed in his own repo.

This is possible. I'd rather postpone it since it would require more
effort to implement and perhaps just be an unnecessary burden.

The reviewer shouldn't be doing "/pulsarbot ready-to-test" for PRs
that aren't ready.
The PR author can continue to iterate the test runs in the fork until
PR comments have been addressed.
It's technically possible to have the same branch in PRs in
apache/pulsar and your own fork. When the branch gets updated, the
builds in the forked repository would run. That's why I think it's
better to keep on iterating on the PR until it's really ready for
final
testing in apache/pulsar repository.

-Lari

On Thu, Sep 15, 2022 at 2:27 PM Yunze Xu <yz...@streamnative.io.invalid> wrote:
>
> > If there are later changes in the PR after the "ready-to-test" label has been added, we could simply let the Pulsar CI handle the builds.
>
> I’m a little confused about what will CI do in this case? I think the “ready-to-test” label should
> be removed in this case because the new code might not pass the tests. I thought the author
> should request committers to add this label again after the tests passed in his own repo.
>
> Thanks,
> Yunze
>
>
>
>
> > On Sep 15, 2022, at 19:20, Lari Hotari <lh...@apache.org> wrote:
> >
> >> In short, IIUC, each contributor should:
> >> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to
> >> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache repo
> >>
> >> Then a committer should run `/pulsarbot ready-to-test` after the PR in
> >> contributor's private repo passed all tests, right?
> >
> > Exactly. One small detail: It should be the PR author's responsibility to follow up and request for a review and an approval after the tests pass.
> > If there are later changes in the PR after the "ready-to-test" label has been added, we could simply let the Pulsar CI handle the builds.
> >
> > -Lari
> >
> > On 2022/09/15 10:11:53 Yunze Xu wrote:
> >> Hi Lari,
> >>
> >> This proposal LGTM. But I have some questions about the details.
> >>
> >> In short, IIUC, each contributor should:
> >> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to
> >> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache repo
> >>
> >> Then a committer should run `/pulsarbot ready-to-test` after the PR in
> >> contributor's private repo passed all tests, right?
> >>
> >> Thanks,
> >> Yunze
> >>
> >>
> >>
> >>
> >>> On Sep 15, 2022, at 17:54, Lari Hotari <lh...@apache.org> wrote:
> >>>
> >>> Thanks for the comment.
> >>>
> >>> The question isn't about trusting PRs.
> >>> The CI resource consumption problem is also caused by current committer PRs. That's why it
> >>> is necessary to handle all PRs in the same way.
> >>> The benefit of the proposed solution is that we could decide to run some light checks automatically before the step that requires the "ready-to-check" label.
> >>>
> >>> After I sent the proposal, I found out that the Pulsar committer information including the GitHub
> >>> user names is available in JSON format at https://whimsy.apache.org/roster/committee/pulsar.json .
> >>> Pulsarbot can use this information for authorizing users who have access to
> >>> "/pulsarbot ready-to-test".
> >>>
> >>> I agree that we can skip adding a separate reviewer role, let's
> >>> simply use https://whimsy.apache.org/roster/committee/pulsar.json as the source of truth
> >>> for authorization.
> >>>
> >>> -Lari
> >>>
> >>> On 2022/09/15 09:22:18 tison wrote:
> >>>> Hi Lari,
> >>>>
> >>>> Thanks for starting this discussion. The overall proposal looks good and
> >>>> it's really great that you can spend some time on such a significant
> >>>> infrastructure.
> >>>>
> >>>> One comment here is that we can start with all "authorized" users to
> >>>> trigger the CI in the committer group instead of introducing a new concept
> >>>> "reviewer" - it will be another topic to discuss and I generally prefer
> >>>> more committership to encourage participation instead of a complicated
> >>>> membership structure.
> >>>>
> >>>> Besides, a quick fixup for reducing traffic is setting "Fork pull request
> >>>> workflows from outside collaborators" option[1] as "Require approval for
> >>>> all outside collaborators". This is provided out-of-the-box by GitHub and
> >>>> requires NO development[2]. Although it doesn't restrict people who are
> >>>> already apache org members but are not Pulsar committers, I believe the
> >>>> trust level is acceptable. An INFRA team member will be asked to perform
> >>>> the settings change if we want this.
> >>>>
> >>>> Best,
> >>>> tison.
> >>>>
> >>>> [1] https://github.com/apache/pulsar/settings/actions
> >>>> [2]
> >>>> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
> >>>>
> >>>>
> >>>> Lari Hotari <lh...@apache.org> 于2022年9月15日周四 16:36写道：
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> The GitHub Actions based Pulsar CI has been experiencing issues for
> >>>>> multiple weeks. The condition is currently better, but the resource
> >>>>> shortage issue remains. CI builds will take a long time to complete even
> >>>>> after many optimizations have been made.
> >>>>>
> >>>>> There's a long email thread with some details about the past issues:
> >>>>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> >>>>>
> >>>>> I have filed an issue to GitHub support about the CI issues over a week
> >>>>> ago, and I finally received an answer a few hours ago. However the
> >>>>> GitHub support person didn't reply to my questions at all, but instead
> >>>>> suggested that there's a beta program where it's possible to pay for
> >>>>> more resources. That solution isn't suitable for our case, since it
> >>>>> doesn't seem to be possible to assign GitHub Actions Runner VM resources
> >>>>> only for a specific Apache project. I'll follow up with GitHub support, but
> >>>>> I don't expect that to resolve our problems in the near term. We need
> >>>>> to make changes in our CI resource consumption.
> >>>>>
> >>>>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> >>>>> suggested: "Apache Spark project requires that all PRs are executed in
> >>>>> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> >>>>>
> >>>>> The Apache Spark contributing guide contains details about this in the
> >>>>> "Pull request" section, https://spark.apache.org/contributing.html .
> >>>>>
> >>>>> "Before creating a pull request in Apache Spark, it is important to
> >>>>> check if tests can pass on your branch because our GitHub Actions
> >>>>> workflows automatically run tests for your pull request/following
> >>>>> commits and every run burdens the limited resources of GitHub Actions in
> >>>>> Apache Spark repository. "
> >>>>>
> >>>>> In Pulsar, we will need to do the same. As a solution to this, Tison
> >>>>> suggested that we would not run all tests for the PR unless there's a
> >>>>> "ready-to-test" label on the PR.
> >>>>>
> >>>>> I think this is a good suggestion. We could extend the existing
> >>>>> "pulsarbot" to help with the automation.
> >>>>>
> >>>>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> >>>>> pulsarbot would add the label and also restart the CI workflow to make
> >>>>> it proceed and run the tests.
> >>>>> pulsarbot would check for authorized users. One simple
> >>>>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> >>>>> repository with the relevant information:
> >>>>>
> >>>>> committer_github_ids:
> >>>>> - committer1
> >>>>> - committer2
> >>>>> ...
> >>>>>
> >>>>> ready_to_test:
> >>>>> authorized_github_ids:
> >>>>>   - userid1
> >>>>>   - userid2
> >>>>>   ...
> >>>>>
> >>>>> We would have a script to synchronize all Pulsar committers to this file
> >>>>> peridiotically (manual step after there's a new committer). ASF provides
> >>>>> public json files for project members at
> >>>>> https://whimsy.apache.org/public/public_ldap_projects.json , however the
> >>>>> mapping to github user names seems to be missing. That could be done
> >>>>> with a custom script since ASF LDAP contains the github username.
> >>>>>
> >>>>> All Pulsar committers would have access. In addition, there could be other
> >>>>> users that are authorized for using "/pulsarbot ready-to-test".
> >>>>>
> >>>>> This solution would also require changes in the GitHub Actions workflows
> >>>>> so that the workflow is failed in an early step unless there's a
> >>>>> ready-to-test label for the PR.
> >>>>>
> >>>>> With the above solution, we would be able to cut the amount of
> >>>>> unnecessary builds and get the excessive resource consumption issue
> >>>>> under control. The PR authors would be instructed to run initial PR
> >>>>> builds in their own fork and the reviewer should check that this is done
> >>>>> before approving the PR for testing with "/pulsarbot ready-to-test".
> >>>>>
> >>>>> I would suggest proceeding quickly on this matter without separate PIPs
> >>>>> or votes. We could follow the Apache lazy consensus
> >>>>> (https://community.apache.org/committers/lazyConsensus.html) principle
> >>>>> and make this happen if there aren't objections in the next 72 hours.
> >>>>> The improvement suggestions to this proposal would obviously be taken
> >>>>> into account and if someone objects, we wouldn't have reached lazy
> >>>>> consensus and we wouldn't proceed.
> >>>>>
> >>>>> -Lari
> >>>>>
> >>>>>
> >>>>> 1 -
> >>>>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> >>>>>
> >>>>
> >>
> >>
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Yunze Xu <yz...@streamnative.io.INVALID>.

> If there are later changes in the PR after the "ready-to-test" label has been added, we could simply let the Pulsar CI handle the builds.

I’m a little confused about what will CI do in this case? I think the “ready-to-test” label should
be removed in this case because the new code might not pass the tests. I thought the author
should request committers to add this label again after the tests passed in his own repo.

Thanks,
Yunze




> On Sep 15, 2022, at 19:20, Lari Hotari <lh...@apache.org> wrote:
> 
>> In short, IIUC, each contributor should:
>> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to 
>> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache repo
>> 
>> Then a committer should run `/pulsarbot ready-to-test` after the PR in
>> contributor's private repo passed all tests, right?
> 
> Exactly. One small detail: It should be the PR author's responsibility to follow up and request for a review and an approval after the tests pass.
> If there are later changes in the PR after the "ready-to-test" label has been added, we could simply let the Pulsar CI handle the builds.
> 
> -Lari
> 
> On 2022/09/15 10:11:53 Yunze Xu wrote:
>> Hi Lari,
>> 
>> This proposal LGTM. But I have some questions about the details.
>> 
>> In short, IIUC, each contributor should:
>> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to 
>> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache repo
>> 
>> Then a committer should run `/pulsarbot ready-to-test` after the PR in
>> contributor's private repo passed all tests, right?
>> 
>> Thanks,
>> Yunze
>> 
>> 
>> 
>> 
>>> On Sep 15, 2022, at 17:54, Lari Hotari <lh...@apache.org> wrote:
>>> 
>>> Thanks for the comment.
>>> 
>>> The question isn't about trusting PRs.
>>> The CI resource consumption problem is also caused by current committer PRs. That's why it
>>> is necessary to handle all PRs in the same way.
>>> The benefit of the proposed solution is that we could decide to run some light checks automatically before the step that requires the "ready-to-check" label.
>>> 
>>> After I sent the proposal, I found out that the Pulsar committer information including the GitHub
>>> user names is available in JSON format at https://whimsy.apache.org/roster/committee/pulsar.json .
>>> Pulsarbot can use this information for authorizing users who have access to 
>>> "/pulsarbot ready-to-test". 
>>> 
>>> I agree that we can skip adding a separate reviewer role, let's
>>> simply use https://whimsy.apache.org/roster/committee/pulsar.json as the source of truth
>>> for authorization.
>>> 
>>> -Lari
>>> 
>>> On 2022/09/15 09:22:18 tison wrote:
>>>> Hi Lari,
>>>> 
>>>> Thanks for starting this discussion. The overall proposal looks good and
>>>> it's really great that you can spend some time on such a significant
>>>> infrastructure.
>>>> 
>>>> One comment here is that we can start with all "authorized" users to
>>>> trigger the CI in the committer group instead of introducing a new concept
>>>> "reviewer" - it will be another topic to discuss and I generally prefer
>>>> more committership to encourage participation instead of a complicated
>>>> membership structure.
>>>> 
>>>> Besides, a quick fixup for reducing traffic is setting "Fork pull request
>>>> workflows from outside collaborators" option[1] as "Require approval for
>>>> all outside collaborators". This is provided out-of-the-box by GitHub and
>>>> requires NO development[2]. Although it doesn't restrict people who are
>>>> already apache org members but are not Pulsar committers, I believe the
>>>> trust level is acceptable. An INFRA team member will be asked to perform
>>>> the settings change if we want this.
>>>> 
>>>> Best,
>>>> tison.
>>>> 
>>>> [1] https://github.com/apache/pulsar/settings/actions
>>>> [2]
>>>> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
>>>> 
>>>> 
>>>> Lari Hotari <lh...@apache.org> 于2022年9月15日周四 16:36写道：
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> The GitHub Actions based Pulsar CI has been experiencing issues for
>>>>> multiple weeks. The condition is currently better, but the resource
>>>>> shortage issue remains. CI builds will take a long time to complete even
>>>>> after many optimizations have been made.
>>>>> 
>>>>> There's a long email thread with some details about the past issues:
>>>>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
>>>>> 
>>>>> I have filed an issue to GitHub support about the CI issues over a week
>>>>> ago, and I finally received an answer a few hours ago. However the
>>>>> GitHub support person didn't reply to my questions at all, but instead
>>>>> suggested that there's a beta program where it's possible to pay for
>>>>> more resources. That solution isn't suitable for our case, since it
>>>>> doesn't seem to be possible to assign GitHub Actions Runner VM resources
>>>>> only for a specific Apache project. I'll follow up with GitHub support, but
>>>>> I don't expect that to resolve our problems in the near term. We need
>>>>> to make changes in our CI resource consumption.
>>>>> 
>>>>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
>>>>> suggested: "Apache Spark project requires that all PRs are executed in
>>>>> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
>>>>> 
>>>>> The Apache Spark contributing guide contains details about this in the
>>>>> "Pull request" section, https://spark.apache.org/contributing.html .
>>>>> 
>>>>> "Before creating a pull request in Apache Spark, it is important to
>>>>> check if tests can pass on your branch because our GitHub Actions
>>>>> workflows automatically run tests for your pull request/following
>>>>> commits and every run burdens the limited resources of GitHub Actions in
>>>>> Apache Spark repository. "
>>>>> 
>>>>> In Pulsar, we will need to do the same. As a solution to this, Tison
>>>>> suggested that we would not run all tests for the PR unless there's a
>>>>> "ready-to-test" label on the PR.
>>>>> 
>>>>> I think this is a good suggestion. We could extend the existing
>>>>> "pulsarbot" to help with the automation.
>>>>> 
>>>>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
>>>>> pulsarbot would add the label and also restart the CI workflow to make
>>>>> it proceed and run the tests.
>>>>> pulsarbot would check for authorized users. One simple
>>>>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
>>>>> repository with the relevant information:
>>>>> 
>>>>> committer_github_ids:
>>>>> - committer1
>>>>> - committer2
>>>>> ...
>>>>> 
>>>>> ready_to_test:
>>>>> authorized_github_ids:
>>>>>   - userid1
>>>>>   - userid2
>>>>>   ...
>>>>> 
>>>>> We would have a script to synchronize all Pulsar committers to this file
>>>>> peridiotically (manual step after there's a new committer). ASF provides
>>>>> public json files for project members at
>>>>> https://whimsy.apache.org/public/public_ldap_projects.json , however the
>>>>> mapping to github user names seems to be missing. That could be done
>>>>> with a custom script since ASF LDAP contains the github username.
>>>>> 
>>>>> All Pulsar committers would have access. In addition, there could be other
>>>>> users that are authorized for using "/pulsarbot ready-to-test".
>>>>> 
>>>>> This solution would also require changes in the GitHub Actions workflows
>>>>> so that the workflow is failed in an early step unless there's a
>>>>> ready-to-test label for the PR.
>>>>> 
>>>>> With the above solution, we would be able to cut the amount of
>>>>> unnecessary builds and get the excessive resource consumption issue
>>>>> under control. The PR authors would be instructed to run initial PR
>>>>> builds in their own fork and the reviewer should check that this is done
>>>>> before approving the PR for testing with "/pulsarbot ready-to-test".
>>>>> 
>>>>> I would suggest proceeding quickly on this matter without separate PIPs
>>>>> or votes. We could follow the Apache lazy consensus
>>>>> (https://community.apache.org/committers/lazyConsensus.html) principle
>>>>> and make this happen if there aren't objections in the next 72 hours.
>>>>> The improvement suggestions to this proposal would obviously be taken
>>>>> into account and if someone objects, we wouldn't have reached lazy
>>>>> consensus and we wouldn't proceed.
>>>>> 
>>>>> -Lari
>>>>> 
>>>>> 
>>>>> 1 -
>>>>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>>>>> 
>>>> 
>> 
>>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Lari Hotari <lh...@apache.org>.

> In short, IIUC, each contributor should:
> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to 
> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache repo
> 
> Then a committer should run `/pulsarbot ready-to-test` after the PR in
> contributor's private repo passed all tests, right?

Exactly. One small detail: It should be the PR author's responsibility to follow up and request for a review and an approval after the tests pass.
If there are later changes in the PR after the "ready-to-test" label has been added, we could simply let the Pulsar CI handle the builds.

-Lari

On 2022/09/15 10:11:53 Yunze Xu wrote:
> Hi Lari,
> 
> This proposal LGTM. But I have some questions about the details.
> 
> In short, IIUC, each contributor should:
> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to 
> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache repo
> 
> Then a committer should run `/pulsarbot ready-to-test` after the PR in
> contributor's private repo passed all tests, right?
> 
> Thanks,
> Yunze
> 
> 
> 
> 
> > On Sep 15, 2022, at 17:54, Lari Hotari <lh...@apache.org> wrote:
> > 
> > Thanks for the comment.
> > 
> > The question isn't about trusting PRs.
> > The CI resource consumption problem is also caused by current committer PRs. That's why it
> > is necessary to handle all PRs in the same way.
> > The benefit of the proposed solution is that we could decide to run some light checks automatically before the step that requires the "ready-to-check" label.
> > 
> > After I sent the proposal, I found out that the Pulsar committer information including the GitHub
> > user names is available in JSON format at https://whimsy.apache.org/roster/committee/pulsar.json .
> > Pulsarbot can use this information for authorizing users who have access to 
> > "/pulsarbot ready-to-test". 
> > 
> > I agree that we can skip adding a separate reviewer role, let's
> > simply use https://whimsy.apache.org/roster/committee/pulsar.json as the source of truth
> > for authorization.
> > 
> > -Lari
> > 
> > On 2022/09/15 09:22:18 tison wrote:
> >> Hi Lari,
> >> 
> >> Thanks for starting this discussion. The overall proposal looks good and
> >> it's really great that you can spend some time on such a significant
> >> infrastructure.
> >> 
> >> One comment here is that we can start with all "authorized" users to
> >> trigger the CI in the committer group instead of introducing a new concept
> >> "reviewer" - it will be another topic to discuss and I generally prefer
> >> more committership to encourage participation instead of a complicated
> >> membership structure.
> >> 
> >> Besides, a quick fixup for reducing traffic is setting "Fork pull request
> >> workflows from outside collaborators" option[1] as "Require approval for
> >> all outside collaborators". This is provided out-of-the-box by GitHub and
> >> requires NO development[2]. Although it doesn't restrict people who are
> >> already apache org members but are not Pulsar committers, I believe the
> >> trust level is acceptable. An INFRA team member will be asked to perform
> >> the settings change if we want this.
> >> 
> >> Best,
> >> tison.
> >> 
> >> [1] https://github.com/apache/pulsar/settings/actions
> >> [2]
> >> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
> >> 
> >> 
> >> Lari Hotari <lh...@apache.org> 于2022年9月15日周四 16:36写道：
> >> 
> >>> Hi all,
> >>> 
> >>> The GitHub Actions based Pulsar CI has been experiencing issues for
> >>> multiple weeks. The condition is currently better, but the resource
> >>> shortage issue remains. CI builds will take a long time to complete even
> >>> after many optimizations have been made.
> >>> 
> >>> There's a long email thread with some details about the past issues:
> >>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> >>> 
> >>> I have filed an issue to GitHub support about the CI issues over a week
> >>> ago, and I finally received an answer a few hours ago. However the
> >>> GitHub support person didn't reply to my questions at all, but instead
> >>> suggested that there's a beta program where it's possible to pay for
> >>> more resources. That solution isn't suitable for our case, since it
> >>> doesn't seem to be possible to assign GitHub Actions Runner VM resources
> >>> only for a specific Apache project. I'll follow up with GitHub support, but
> >>> I don't expect that to resolve our problems in the near term. We need
> >>> to make changes in our CI resource consumption.
> >>> 
> >>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> >>> suggested: "Apache Spark project requires that all PRs are executed in
> >>> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> >>> 
> >>> The Apache Spark contributing guide contains details about this in the
> >>> "Pull request" section, https://spark.apache.org/contributing.html .
> >>> 
> >>> "Before creating a pull request in Apache Spark, it is important to
> >>> check if tests can pass on your branch because our GitHub Actions
> >>> workflows automatically run tests for your pull request/following
> >>> commits and every run burdens the limited resources of GitHub Actions in
> >>> Apache Spark repository. "
> >>> 
> >>> In Pulsar, we will need to do the same. As a solution to this, Tison
> >>> suggested that we would not run all tests for the PR unless there's a
> >>> "ready-to-test" label on the PR.
> >>> 
> >>> I think this is a good suggestion. We could extend the existing
> >>> "pulsarbot" to help with the automation.
> >>> 
> >>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> >>> pulsarbot would add the label and also restart the CI workflow to make
> >>> it proceed and run the tests.
> >>> pulsarbot would check for authorized users. One simple
> >>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> >>> repository with the relevant information:
> >>> 
> >>> committer_github_ids:
> >>>  - committer1
> >>>  - committer2
> >>>  ...
> >>> 
> >>> ready_to_test:
> >>>  authorized_github_ids:
> >>>    - userid1
> >>>    - userid2
> >>>    ...
> >>> 
> >>> We would have a script to synchronize all Pulsar committers to this file
> >>> peridiotically (manual step after there's a new committer). ASF provides
> >>> public json files for project members at
> >>> https://whimsy.apache.org/public/public_ldap_projects.json , however the
> >>> mapping to github user names seems to be missing. That could be done
> >>> with a custom script since ASF LDAP contains the github username.
> >>> 
> >>> All Pulsar committers would have access. In addition, there could be other
> >>> users that are authorized for using "/pulsarbot ready-to-test".
> >>> 
> >>> This solution would also require changes in the GitHub Actions workflows
> >>> so that the workflow is failed in an early step unless there's a
> >>> ready-to-test label for the PR.
> >>> 
> >>> With the above solution, we would be able to cut the amount of
> >>> unnecessary builds and get the excessive resource consumption issue
> >>> under control. The PR authors would be instructed to run initial PR
> >>> builds in their own fork and the reviewer should check that this is done
> >>> before approving the PR for testing with "/pulsarbot ready-to-test".
> >>> 
> >>> I would suggest proceeding quickly on this matter without separate PIPs
> >>> or votes. We could follow the Apache lazy consensus
> >>> (https://community.apache.org/committers/lazyConsensus.html) principle
> >>> and make this happen if there aren't objections in the next 72 hours.
> >>> The improvement suggestions to this proposal would obviously be taken
> >>> into account and if someone objects, we wouldn't have reached lazy
> >>> consensus and we wouldn't proceed.
> >>> 
> >>> -Lari
> >>> 
> >>> 
> >>> 1 -
> >>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> >>> 
> >> 
> 
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Yunze Xu <yz...@streamnative.io.INVALID>.

Hi Lari,

This proposal LGTM. But I have some questions about the details.

In short, IIUC, each contributor should:
1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork to 
2. Paste the link of the same PR in contributor’s fork to the PR in Apache repo

Then a committer should run `/pulsarbot ready-to-test` after the PR in
contributor's private repo passed all tests, right?

Thanks,
Yunze




> On Sep 15, 2022, at 17:54, Lari Hotari <lh...@apache.org> wrote:
> 
> Thanks for the comment.
> 
> The question isn't about trusting PRs.
> The CI resource consumption problem is also caused by current committer PRs. That's why it
> is necessary to handle all PRs in the same way.
> The benefit of the proposed solution is that we could decide to run some light checks automatically before the step that requires the "ready-to-check" label.
> 
> After I sent the proposal, I found out that the Pulsar committer information including the GitHub
> user names is available in JSON format at https://whimsy.apache.org/roster/committee/pulsar.json .
> Pulsarbot can use this information for authorizing users who have access to 
> "/pulsarbot ready-to-test". 
> 
> I agree that we can skip adding a separate reviewer role, let's
> simply use https://whimsy.apache.org/roster/committee/pulsar.json as the source of truth
> for authorization.
> 
> -Lari
> 
> On 2022/09/15 09:22:18 tison wrote:
>> Hi Lari,
>> 
>> Thanks for starting this discussion. The overall proposal looks good and
>> it's really great that you can spend some time on such a significant
>> infrastructure.
>> 
>> One comment here is that we can start with all "authorized" users to
>> trigger the CI in the committer group instead of introducing a new concept
>> "reviewer" - it will be another topic to discuss and I generally prefer
>> more committership to encourage participation instead of a complicated
>> membership structure.
>> 
>> Besides, a quick fixup for reducing traffic is setting "Fork pull request
>> workflows from outside collaborators" option[1] as "Require approval for
>> all outside collaborators". This is provided out-of-the-box by GitHub and
>> requires NO development[2]. Although it doesn't restrict people who are
>> already apache org members but are not Pulsar committers, I believe the
>> trust level is acceptable. An INFRA team member will be asked to perform
>> the settings change if we want this.
>> 
>> Best,
>> tison.
>> 
>> [1] https://github.com/apache/pulsar/settings/actions
>> [2]
>> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
>> 
>> 
>> Lari Hotari <lh...@apache.org> 于2022年9月15日周四 16:36写道：
>> 
>>> Hi all,
>>> 
>>> The GitHub Actions based Pulsar CI has been experiencing issues for
>>> multiple weeks. The condition is currently better, but the resource
>>> shortage issue remains. CI builds will take a long time to complete even
>>> after many optimizations have been made.
>>> 
>>> There's a long email thread with some details about the past issues:
>>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
>>> 
>>> I have filed an issue to GitHub support about the CI issues over a week
>>> ago, and I finally received an answer a few hours ago. However the
>>> GitHub support person didn't reply to my questions at all, but instead
>>> suggested that there's a beta program where it's possible to pay for
>>> more resources. That solution isn't suitable for our case, since it
>>> doesn't seem to be possible to assign GitHub Actions Runner VM resources
>>> only for a specific Apache project. I'll follow up with GitHub support, but
>>> I don't expect that to resolve our problems in the near term. We need
>>> to make changes in our CI resource consumption.
>>> 
>>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
>>> suggested: "Apache Spark project requires that all PRs are executed in
>>> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
>>> 
>>> The Apache Spark contributing guide contains details about this in the
>>> "Pull request" section, https://spark.apache.org/contributing.html .
>>> 
>>> "Before creating a pull request in Apache Spark, it is important to
>>> check if tests can pass on your branch because our GitHub Actions
>>> workflows automatically run tests for your pull request/following
>>> commits and every run burdens the limited resources of GitHub Actions in
>>> Apache Spark repository. "
>>> 
>>> In Pulsar, we will need to do the same. As a solution to this, Tison
>>> suggested that we would not run all tests for the PR unless there's a
>>> "ready-to-test" label on the PR.
>>> 
>>> I think this is a good suggestion. We could extend the existing
>>> "pulsarbot" to help with the automation.
>>> 
>>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
>>> pulsarbot would add the label and also restart the CI workflow to make
>>> it proceed and run the tests.
>>> pulsarbot would check for authorized users. One simple
>>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
>>> repository with the relevant information:
>>> 
>>> committer_github_ids:
>>>  - committer1
>>>  - committer2
>>>  ...
>>> 
>>> ready_to_test:
>>>  authorized_github_ids:
>>>    - userid1
>>>    - userid2
>>>    ...
>>> 
>>> We would have a script to synchronize all Pulsar committers to this file
>>> peridiotically (manual step after there's a new committer). ASF provides
>>> public json files for project members at
>>> https://whimsy.apache.org/public/public_ldap_projects.json , however the
>>> mapping to github user names seems to be missing. That could be done
>>> with a custom script since ASF LDAP contains the github username.
>>> 
>>> All Pulsar committers would have access. In addition, there could be other
>>> users that are authorized for using "/pulsarbot ready-to-test".
>>> 
>>> This solution would also require changes in the GitHub Actions workflows
>>> so that the workflow is failed in an early step unless there's a
>>> ready-to-test label for the PR.
>>> 
>>> With the above solution, we would be able to cut the amount of
>>> unnecessary builds and get the excessive resource consumption issue
>>> under control. The PR authors would be instructed to run initial PR
>>> builds in their own fork and the reviewer should check that this is done
>>> before approving the PR for testing with "/pulsarbot ready-to-test".
>>> 
>>> I would suggest proceeding quickly on this matter without separate PIPs
>>> or votes. We could follow the Apache lazy consensus
>>> (https://community.apache.org/committers/lazyConsensus.html) principle
>>> and make this happen if there aren't objections in the next 72 hours.
>>> The improvement suggestions to this proposal would obviously be taken
>>> into account and if someone objects, we wouldn't have reached lazy
>>> consensus and we wouldn't proceed.
>>> 
>>> -Lari
>>> 
>>> 
>>> 1 -
>>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>>> 
>>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Lari Hotari <lh...@apache.org>.

Thanks for the comment.

The question isn't about trusting PRs.
The CI resource consumption problem is also caused by current committer PRs. That's why it
is necessary to handle all PRs in the same way.
The benefit of the proposed solution is that we could decide to run some light checks automatically before the step that requires the "ready-to-check" label.

After I sent the proposal, I found out that the Pulsar committer information including the GitHub
user names is available in JSON format at https://whimsy.apache.org/roster/committee/pulsar.json .
Pulsarbot can use this information for authorizing users who have access to 
"/pulsarbot ready-to-test". 

I agree that we can skip adding a separate reviewer role, let's
simply use https://whimsy.apache.org/roster/committee/pulsar.json as the source of truth
for authorization.

-Lari

On 2022/09/15 09:22:18 tison wrote:
> Hi Lari,
> 
> Thanks for starting this discussion. The overall proposal looks good and
> it's really great that you can spend some time on such a significant
> infrastructure.
> 
> One comment here is that we can start with all "authorized" users to
> trigger the CI in the committer group instead of introducing a new concept
> "reviewer" - it will be another topic to discuss and I generally prefer
> more committership to encourage participation instead of a complicated
> membership structure.
> 
> Besides, a quick fixup for reducing traffic is setting "Fork pull request
> workflows from outside collaborators" option[1] as "Require approval for
> all outside collaborators". This is provided out-of-the-box by GitHub and
> requires NO development[2]. Although it doesn't restrict people who are
> already apache org members but are not Pulsar committers, I believe the
> trust level is acceptable. An INFRA team member will be asked to perform
> the settings change if we want this.
> 
> Best,
> tison.
> 
> [1] https://github.com/apache/pulsar/settings/actions
> [2]
> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
> 
> 
> Lari Hotari <lh...@apache.org> 于2022年9月15日周四 16:36写道：
> 
> > Hi all,
> >
> > The GitHub Actions based Pulsar CI has been experiencing issues for
> > multiple weeks. The condition is currently better, but the resource
> > shortage issue remains. CI builds will take a long time to complete even
> > after many optimizations have been made.
> >
> > There's a long email thread with some details about the past issues:
> > https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> >
> > I have filed an issue to GitHub support about the CI issues over a week
> > ago, and I finally received an answer a few hours ago. However the
> > GitHub support person didn't reply to my questions at all, but instead
> > suggested that there's a beta program where it's possible to pay for
> > more resources. That solution isn't suitable for our case, since it
> > doesn't seem to be possible to assign GitHub Actions Runner VM resources
> > only for a specific Apache project. I'll follow up with GitHub support, but
> > I don't expect that to resolve our problems in the near term. We need
> > to make changes in our CI resource consumption.
> >
> > In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> > suggested: "Apache Spark project requires that all PRs are executed in
> > the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> >
> > The Apache Spark contributing guide contains details about this in the
> > "Pull request" section, https://spark.apache.org/contributing.html .
> >
> > "Before creating a pull request in Apache Spark, it is important to
> > check if tests can pass on your branch because our GitHub Actions
> > workflows automatically run tests for your pull request/following
> > commits and every run burdens the limited resources of GitHub Actions in
> > Apache Spark repository. "
> >
> > In Pulsar, we will need to do the same. As a solution to this, Tison
> > suggested that we would not run all tests for the PR unless there's a
> > "ready-to-test" label on the PR.
> >
> > I think this is a good suggestion. We could extend the existing
> > "pulsarbot" to help with the automation.
> >
> > A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> > pulsarbot would add the label and also restart the CI workflow to make
> > it proceed and run the tests.
> > pulsarbot would check for authorized users. One simple
> > approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> > repository with the relevant information:
> >
> > committer_github_ids:
> >   - committer1
> >   - committer2
> >   ...
> >
> > ready_to_test:
> >   authorized_github_ids:
> >     - userid1
> >     - userid2
> >     ...
> >
> > We would have a script to synchronize all Pulsar committers to this file
> > peridiotically (manual step after there's a new committer). ASF provides
> > public json files for project members at
> > https://whimsy.apache.org/public/public_ldap_projects.json , however the
> > mapping to github user names seems to be missing. That could be done
> > with a custom script since ASF LDAP contains the github username.
> >
> > All Pulsar committers would have access. In addition, there could be other
> > users that are authorized for using "/pulsarbot ready-to-test".
> >
> > This solution would also require changes in the GitHub Actions workflows
> > so that the workflow is failed in an early step unless there's a
> > ready-to-test label for the PR.
> >
> > With the above solution, we would be able to cut the amount of
> > unnecessary builds and get the excessive resource consumption issue
> > under control. The PR authors would be instructed to run initial PR
> > builds in their own fork and the reviewer should check that this is done
> > before approving the PR for testing with "/pulsarbot ready-to-test".
> >
> > I would suggest proceeding quickly on this matter without separate PIPs
> > or votes. We could follow the Apache lazy consensus
> > (https://community.apache.org/committers/lazyConsensus.html) principle
> > and make this happen if there aren't objections in the next 72 hours.
> > The improvement suggestions to this proposal would obviously be taken
> > into account and if someone objects, we wouldn't have reached lazy
> > consensus and we wouldn't proceed.
> >
> > -Lari
> >
> >
> > 1 -
> > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> >
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Lari Hotari <lh...@apache.org>.

> One more comment: you should take `/pulsarbot run-failure-checks` into
> consideration. It's now triggered by any actors and signals a rerun on
> behalf of @codelipenghui. Following your proposal I suggest this manner
> should be restricted also. And it actually means that our committers should
> be more actively handling PRs.

Good idea. We could consider doing this additional step if we continue to see CI resource shortage after applying the ready-to-test solution.

btw. We should fix the Pulsarbot workflow to assign a sufficient token to the workflow 
without using Penghui's personal token. The feature was announced in 04/2021: https://github.blog/changelog/2021-04-20-github-actions-control-permissions-for-github_token/
Penghui's personal token should be removed from apache/pulsar GitHub Actions workflows.

-Lari

On 2022/09/15 09:26:06 tison wrote:
> One more comment: you should take `/pulsarbot run-failure-checks` into
> consideration. It's now triggered by any actors and signals a rerun on
> behalf of @codelipenghui. Following your proposal I suggest this manner
> should be restricted also. And it actually means that our committers should
> be more actively handling PRs.
> 
> Best,
> tison.
> 
> 
> tison <wa...@gmail.com> 于2022年9月15日周四 17:22写道：
> 
> > Hi Lari,
> >
> > Thanks for starting this discussion. The overall proposal looks good and
> > it's really great that you can spend some time on such a significant
> > infrastructure.
> >
> > One comment here is that we can start with all "authorized" users to
> > trigger the CI in the committer group instead of introducing a new concept
> > "reviewer" - it will be another topic to discuss and I generally prefer
> > more committership to encourage participation instead of a complicated
> > membership structure.
> >
> > Besides, a quick fixup for reducing traffic is setting "Fork pull request
> > workflows from outside collaborators" option[1] as "Require approval for
> > all outside collaborators". This is provided out-of-the-box by GitHub and
> > requires NO development[2]. Although it doesn't restrict people who are
> > already apache org members but are not Pulsar committers, I believe the
> > trust level is acceptable. An INFRA team member will be asked to perform
> > the settings change if we want this.
> >
> > Best,
> > tison.
> >
> > [1] https://github.com/apache/pulsar/settings/actions
> > [2]
> > https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
> >
> >
> > Lari Hotari <lh...@apache.org> 于2022年9月15日周四 16:36写道：
> >
> >> Hi all,
> >>
> >> The GitHub Actions based Pulsar CI has been experiencing issues for
> >> multiple weeks. The condition is currently better, but the resource
> >> shortage issue remains. CI builds will take a long time to complete even
> >> after many optimizations have been made.
> >>
> >> There's a long email thread with some details about the past issues:
> >> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> >>
> >> I have filed an issue to GitHub support about the CI issues over a week
> >> ago, and I finally received an answer a few hours ago. However the
> >> GitHub support person didn't reply to my questions at all, but instead
> >> suggested that there's a beta program where it's possible to pay for
> >> more resources. That solution isn't suitable for our case, since it
> >> doesn't seem to be possible to assign GitHub Actions Runner VM resources
> >> only for a specific Apache project. I'll follow up with GitHub support,
> >> but
> >> I don't expect that to resolve our problems in the near term. We need
> >> to make changes in our CI resource consumption.
> >>
> >> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> >> suggested: "Apache Spark project requires that all PRs are executed in
> >> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> >>
> >> The Apache Spark contributing guide contains details about this in the
> >> "Pull request" section, https://spark.apache.org/contributing.html .
> >>
> >> "Before creating a pull request in Apache Spark, it is important to
> >> check if tests can pass on your branch because our GitHub Actions
> >> workflows automatically run tests for your pull request/following
> >> commits and every run burdens the limited resources of GitHub Actions in
> >> Apache Spark repository. "
> >>
> >> In Pulsar, we will need to do the same. As a solution to this, Tison
> >> suggested that we would not run all tests for the PR unless there's a
> >> "ready-to-test" label on the PR.
> >>
> >> I think this is a good suggestion. We could extend the existing
> >> "pulsarbot" to help with the automation.
> >>
> >> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> >> pulsarbot would add the label and also restart the CI workflow to make
> >> it proceed and run the tests.
> >> pulsarbot would check for authorized users. One simple
> >> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> >> repository with the relevant information:
> >>
> >> committer_github_ids:
> >>   - committer1
> >>   - committer2
> >>   ...
> >>
> >> ready_to_test:
> >>   authorized_github_ids:
> >>     - userid1
> >>     - userid2
> >>     ...
> >>
> >> We would have a script to synchronize all Pulsar committers to this file
> >> peridiotically (manual step after there's a new committer). ASF provides
> >> public json files for project members at
> >> https://whimsy.apache.org/public/public_ldap_projects.json , however the
> >> mapping to github user names seems to be missing. That could be done
> >> with a custom script since ASF LDAP contains the github username.
> >>
> >> All Pulsar committers would have access. In addition, there could be other
> >> users that are authorized for using "/pulsarbot ready-to-test".
> >>
> >> This solution would also require changes in the GitHub Actions workflows
> >> so that the workflow is failed in an early step unless there's a
> >> ready-to-test label for the PR.
> >>
> >> With the above solution, we would be able to cut the amount of
> >> unnecessary builds and get the excessive resource consumption issue
> >> under control. The PR authors would be instructed to run initial PR
> >> builds in their own fork and the reviewer should check that this is done
> >> before approving the PR for testing with "/pulsarbot ready-to-test".
> >>
> >> I would suggest proceeding quickly on this matter without separate PIPs
> >> or votes. We could follow the Apache lazy consensus
> >> (https://community.apache.org/committers/lazyConsensus.html) principle
> >> and make this happen if there aren't objections in the next 72 hours.
> >> The improvement suggestions to this proposal would obviously be taken
> >> into account and if someone objects, we wouldn't have reached lazy
> >> consensus and we wouldn't proceed.
> >>
> >> -Lari
> >>
> >>
> >> 1 -
> >> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> >>
> >
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by tison <wa...@gmail.com>.

One more comment: you should take `/pulsarbot run-failure-checks` into
consideration. It's now triggered by any actors and signals a rerun on
behalf of @codelipenghui. Following your proposal I suggest this manner
should be restricted also. And it actually means that our committers should
be more actively handling PRs.

Best,
tison.


tison <wa...@gmail.com> 于2022年9月15日周四 17:22写道：

> Hi Lari,
>
> Thanks for starting this discussion. The overall proposal looks good and
> it's really great that you can spend some time on such a significant
> infrastructure.
>
> One comment here is that we can start with all "authorized" users to
> trigger the CI in the committer group instead of introducing a new concept
> "reviewer" - it will be another topic to discuss and I generally prefer
> more committership to encourage participation instead of a complicated
> membership structure.
>
> Besides, a quick fixup for reducing traffic is setting "Fork pull request
> workflows from outside collaborators" option[1] as "Require approval for
> all outside collaborators". This is provided out-of-the-box by GitHub and
> requires NO development[2]. Although it doesn't restrict people who are
> already apache org members but are not Pulsar committers, I believe the
> trust level is acceptable. An INFRA team member will be asked to perform
> the settings change if we want this.
>
> Best,
> tison.
>
> [1] https://github.com/apache/pulsar/settings/actions
> [2]
> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
>
>
> Lari Hotari <lh...@apache.org> 于2022年9月15日周四 16:36写道：
>
>> Hi all,
>>
>> The GitHub Actions based Pulsar CI has been experiencing issues for
>> multiple weeks. The condition is currently better, but the resource
>> shortage issue remains. CI builds will take a long time to complete even
>> after many optimizations have been made.
>>
>> There's a long email thread with some details about the past issues:
>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
>>
>> I have filed an issue to GitHub support about the CI issues over a week
>> ago, and I finally received an answer a few hours ago. However the
>> GitHub support person didn't reply to my questions at all, but instead
>> suggested that there's a beta program where it's possible to pay for
>> more resources. That solution isn't suitable for our case, since it
>> doesn't seem to be possible to assign GitHub Actions Runner VM resources
>> only for a specific Apache project. I'll follow up with GitHub support,
>> but
>> I don't expect that to resolve our problems in the near term. We need
>> to make changes in our CI resource consumption.
>>
>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
>> suggested: "Apache Spark project requires that all PRs are executed in
>> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
>>
>> The Apache Spark contributing guide contains details about this in the
>> "Pull request" section, https://spark.apache.org/contributing.html .
>>
>> "Before creating a pull request in Apache Spark, it is important to
>> check if tests can pass on your branch because our GitHub Actions
>> workflows automatically run tests for your pull request/following
>> commits and every run burdens the limited resources of GitHub Actions in
>> Apache Spark repository. "
>>
>> In Pulsar, we will need to do the same. As a solution to this, Tison
>> suggested that we would not run all tests for the PR unless there's a
>> "ready-to-test" label on the PR.
>>
>> I think this is a good suggestion. We could extend the existing
>> "pulsarbot" to help with the automation.
>>
>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
>> pulsarbot would add the label and also restart the CI workflow to make
>> it proceed and run the tests.
>> pulsarbot would check for authorized users. One simple
>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
>> repository with the relevant information:
>>
>> committer_github_ids:
>>   - committer1
>>   - committer2
>>   ...
>>
>> ready_to_test:
>>   authorized_github_ids:
>>     - userid1
>>     - userid2
>>     ...
>>
>> We would have a script to synchronize all Pulsar committers to this file
>> peridiotically (manual step after there's a new committer). ASF provides
>> public json files for project members at
>> https://whimsy.apache.org/public/public_ldap_projects.json , however the
>> mapping to github user names seems to be missing. That could be done
>> with a custom script since ASF LDAP contains the github username.
>>
>> All Pulsar committers would have access. In addition, there could be other
>> users that are authorized for using "/pulsarbot ready-to-test".
>>
>> This solution would also require changes in the GitHub Actions workflows
>> so that the workflow is failed in an early step unless there's a
>> ready-to-test label for the PR.
>>
>> With the above solution, we would be able to cut the amount of
>> unnecessary builds and get the excessive resource consumption issue
>> under control. The PR authors would be instructed to run initial PR
>> builds in their own fork and the reviewer should check that this is done
>> before approving the PR for testing with "/pulsarbot ready-to-test".
>>
>> I would suggest proceeding quickly on this matter without separate PIPs
>> or votes. We could follow the Apache lazy consensus
>> (https://community.apache.org/committers/lazyConsensus.html) principle
>> and make this happen if there aren't objections in the next 72 hours.
>> The improvement suggestions to this proposal would obviously be taken
>> into account and if someone objects, we wouldn't have reached lazy
>> consensus and we wouldn't proceed.
>>
>> -Lari
>>
>>
>> 1 -
>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>>
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by tison <wa...@gmail.com>.

Hi Lari,

Thanks for starting this discussion. The overall proposal looks good and
it's really great that you can spend some time on such a significant
infrastructure.

One comment here is that we can start with all "authorized" users to
trigger the CI in the committer group instead of introducing a new concept
"reviewer" - it will be another topic to discuss and I generally prefer
more committership to encourage participation instead of a complicated
membership structure.

Besides, a quick fixup for reducing traffic is setting "Fork pull request
workflows from outside collaborators" option[1] as "Require approval for
all outside collaborators". This is provided out-of-the-box by GitHub and
requires NO development[2]. Although it doesn't restrict people who are
already apache org members but are not Pulsar committers, I believe the
trust level is acceptable. An INFRA team member will be asked to perform
the settings change if we want this.

Best,
tison.

[1] https://github.com/apache/pulsar/settings/actions
[2]
https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks


Lari Hotari <lh...@apache.org> 于2022年9月15日周四 16:36写道：

> Hi all,
>
> The GitHub Actions based Pulsar CI has been experiencing issues for
> multiple weeks. The condition is currently better, but the resource
> shortage issue remains. CI builds will take a long time to complete even
> after many optimizations have been made.
>
> There's a long email thread with some details about the past issues:
> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
>
> I have filed an issue to GitHub support about the CI issues over a week
> ago, and I finally received an answer a few hours ago. However the
> GitHub support person didn't reply to my questions at all, but instead
> suggested that there's a beta program where it's possible to pay for
> more resources. That solution isn't suitable for our case, since it
> doesn't seem to be possible to assign GitHub Actions Runner VM resources
> only for a specific Apache project. I'll follow up with GitHub support, but
> I don't expect that to resolve our problems in the near term. We need
> to make changes in our CI resource consumption.
>
> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> suggested: "Apache Spark project requires that all PRs are executed in
> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
>
> The Apache Spark contributing guide contains details about this in the
> "Pull request" section, https://spark.apache.org/contributing.html .
>
> "Before creating a pull request in Apache Spark, it is important to
> check if tests can pass on your branch because our GitHub Actions
> workflows automatically run tests for your pull request/following
> commits and every run burdens the limited resources of GitHub Actions in
> Apache Spark repository. "
>
> In Pulsar, we will need to do the same. As a solution to this, Tison
> suggested that we would not run all tests for the PR unless there's a
> "ready-to-test" label on the PR.
>
> I think this is a good suggestion. We could extend the existing
> "pulsarbot" to help with the automation.
>
> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> pulsarbot would add the label and also restart the CI workflow to make
> it proceed and run the tests.
> pulsarbot would check for authorized users. One simple
> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> repository with the relevant information:
>
> committer_github_ids:
>   - committer1
>   - committer2
>   ...
>
> ready_to_test:
>   authorized_github_ids:
>     - userid1
>     - userid2
>     ...
>
> We would have a script to synchronize all Pulsar committers to this file
> peridiotically (manual step after there's a new committer). ASF provides
> public json files for project members at
> https://whimsy.apache.org/public/public_ldap_projects.json , however the
> mapping to github user names seems to be missing. That could be done
> with a custom script since ASF LDAP contains the github username.
>
> All Pulsar committers would have access. In addition, there could be other
> users that are authorized for using "/pulsarbot ready-to-test".
>
> This solution would also require changes in the GitHub Actions workflows
> so that the workflow is failed in an early step unless there's a
> ready-to-test label for the PR.
>
> With the above solution, we would be able to cut the amount of
> unnecessary builds and get the excessive resource consumption issue
> under control. The PR authors would be instructed to run initial PR
> builds in their own fork and the reviewer should check that this is done
> before approving the PR for testing with "/pulsarbot ready-to-test".
>
> I would suggest proceeding quickly on this matter without separate PIPs
> or votes. We could follow the Apache lazy consensus
> (https://community.apache.org/committers/lazyConsensus.html) principle
> and make this happen if there aren't objections in the next 72 hours.
> The improvement suggestions to this proposal would obviously be taken
> into account and if someone objects, we wouldn't have reached lazy
> consensus and we wouldn't proceed.
>
> -Lari
>
>
> 1 -
> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Zixuan Liu <no...@gmail.com>.

Hi Lari,

This is a good idea, I agree with that.

Once the committer added a "ready-to-test" label to a PR, then the
contributor can run the Pulsar CI.

Thanks,
Zixuan

Lari Hotari <lh...@apache.org> 于2022年9月15日周四 23:30写道：

> On 2022/09/15 15:09:59 Yubiao Feng wrote:
> > Hi Lari:
> >
> > That is really a good way.
> > I think it is possible to add another button to cancel the running task.
> > because after the user submits the PR, he finds other problems that need
> to
> > be fixed. In this case, he can cancel the task by himself.
>
> Thanks for the feedback Yubiao,
>
> As explained in the proposal, we currently have a resource shortage and we
> have to cut GitHub Actions usage under the apache/pulsar project. When
> users run the majority of test runs in their own forks, it won't impact
> apache/pulsar project. Users have full access to cancel builds in their own
> forks. There's a cancel button available.
>
> > > we can start with all "authorized" users to trigger the CI
> >
> > I think all contributors need to get permission. If only commiters have
> > permission, this will hurt the enthusiasm of community contributors.
> Almost
> > PR submissions are submitted by mature contributors, and they will follow
> > the rules to save resources
>
> Committers are required for reviewing and merging PRs. I think this is
> well aligned with that.
> There's no reason to be hurt. Things will be better for everyone when
> everyone uses their own fork to run tests for PRs and only when the PR is
> reviewed, we proceed to run tests in apache/pulsar project.
>
> -Lari
>
> >
> > Thanks
> > Yubiao Feng
> >
> > On Thu, Sep 15, 2022 at 4:36 PM Lari Hotari <lh...@apache.org> wrote:
> >
> > > Hi all,
> > >
> > > The GitHub Actions based Pulsar CI has been experiencing issues for
> > > multiple weeks. The condition is currently better, but the resource
> > > shortage issue remains. CI builds will take a long time to complete
> even
> > > after many optimizations have been made.
> > >
> > > There's a long email thread with some details about the past issues:
> > > https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> > >
> > > I have filed an issue to GitHub support about the CI issues over a week
> > > ago, and I finally received an answer a few hours ago. However the
> > > GitHub support person didn't reply to my questions at all, but instead
> > > suggested that there's a beta program where it's possible to pay for
> > > more resources. That solution isn't suitable for our case, since it
> > > doesn't seem to be possible to assign GitHub Actions Runner VM
> resources
> > > only for a specific Apache project. I'll follow up with GitHub
> support, but
> > > I don't expect that to resolve our problems in the near term. We need
> > > to make changes in our CI resource consumption.
> > >
> > > In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> > > suggested: "Apache Spark project requires that all PRs are executed in
> > > the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> > >
> > > The Apache Spark contributing guide contains details about this in the
> > > "Pull request" section, https://spark.apache.org/contributing.html .
> > >
> > > "Before creating a pull request in Apache Spark, it is important to
> > > check if tests can pass on your branch because our GitHub Actions
> > > workflows automatically run tests for your pull request/following
> > > commits and every run burdens the limited resources of GitHub Actions
> in
> > > Apache Spark repository. "
> > >
> > > In Pulsar, we will need to do the same. As a solution to this, Tison
> > > suggested that we would not run all tests for the PR unless there's a
> > > "ready-to-test" label on the PR.
> > >
> > > I think this is a good suggestion. We could extend the existing
> > > "pulsarbot" to help with the automation.
> > >
> > > A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> > > pulsarbot would add the label and also restart the CI workflow to make
> > > it proceed and run the tests.
> > > pulsarbot would check for authorized users. One simple
> > > approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> > > repository with the relevant information:
> > >
> > > committer_github_ids:
> > >   - committer1
> > >   - committer2
> > >   ...
> > >
> > > ready_to_test:
> > >   authorized_github_ids:
> > >     - userid1
> > >     - userid2
> > >     ...
> > >
> > > We would have a script to synchronize all Pulsar committers to this
> file
> > > peridiotically (manual step after there's a new committer). ASF
> provides
> > > public json files for project members at
> > > https://whimsy.apache.org/public/public_ldap_projects.json , however
> the
> > > mapping to github user names seems to be missing. That could be done
> > > with a custom script since ASF LDAP contains the github username.
> > >
> > > All Pulsar committers would have access. In addition, there could be
> other
> > > users that are authorized for using "/pulsarbot ready-to-test".
> > >
> > > This solution would also require changes in the GitHub Actions
> workflows
> > > so that the workflow is failed in an early step unless there's a
> > > ready-to-test label for the PR.
> > >
> > > With the above solution, we would be able to cut the amount of
> > > unnecessary builds and get the excessive resource consumption issue
> > > under control. The PR authors would be instructed to run initial PR
> > > builds in their own fork and the reviewer should check that this is
> done
> > > before approving the PR for testing with "/pulsarbot ready-to-test".
> > >
> > > I would suggest proceeding quickly on this matter without separate PIPs
> > > or votes. We could follow the Apache lazy consensus
> > > (https://community.apache.org/committers/lazyConsensus.html) principle
> > > and make this happen if there aren't objections in the next 72 hours.
> > > The improvement suggestions to this proposal would obviously be taken
> > > into account and if someone objects, we wouldn't have reached lazy
> > > consensus and we wouldn't proceed.
> > >
> > > -Lari
> > >
> > >
> > > 1 -
> > >
> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> > >
> >
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Lari Hotari <lh...@apache.org>.

On 2022/09/15 15:09:59 Yubiao Feng wrote:
> Hi Lari:
> 
> That is really a good way.
> I think it is possible to add another button to cancel the running task.
> because after the user submits the PR, he finds other problems that need to
> be fixed. In this case, he can cancel the task by himself.

Thanks for the feedback Yubiao,

As explained in the proposal, we currently have a resource shortage and we have to cut GitHub Actions usage under the apache/pulsar project. When users run the majority of test runs in their own forks, it won't impact apache/pulsar project. Users have full access to cancel builds in their own forks. There's a cancel button available.

> > we can start with all "authorized" users to trigger the CI
> 
> I think all contributors need to get permission. If only commiters have
> permission, this will hurt the enthusiasm of community contributors. Almost
> PR submissions are submitted by mature contributors, and they will follow
> the rules to save resources

Committers are required for reviewing and merging PRs. I think this is well aligned with that. 
There's no reason to be hurt. Things will be better for everyone when everyone uses their own fork to run tests for PRs and only when the PR is reviewed, we proceed to run tests in apache/pulsar project.

-Lari

> 
> Thanks
> Yubiao Feng
> 
> On Thu, Sep 15, 2022 at 4:36 PM Lari Hotari <lh...@apache.org> wrote:
> 
> > Hi all,
> >
> > The GitHub Actions based Pulsar CI has been experiencing issues for
> > multiple weeks. The condition is currently better, but the resource
> > shortage issue remains. CI builds will take a long time to complete even
> > after many optimizations have been made.
> >
> > There's a long email thread with some details about the past issues:
> > https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> >
> > I have filed an issue to GitHub support about the CI issues over a week
> > ago, and I finally received an answer a few hours ago. However the
> > GitHub support person didn't reply to my questions at all, but instead
> > suggested that there's a beta program where it's possible to pay for
> > more resources. That solution isn't suitable for our case, since it
> > doesn't seem to be possible to assign GitHub Actions Runner VM resources
> > only for a specific Apache project. I'll follow up with GitHub support, but
> > I don't expect that to resolve our problems in the near term. We need
> > to make changes in our CI resource consumption.
> >
> > In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> > suggested: "Apache Spark project requires that all PRs are executed in
> > the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> >
> > The Apache Spark contributing guide contains details about this in the
> > "Pull request" section, https://spark.apache.org/contributing.html .
> >
> > "Before creating a pull request in Apache Spark, it is important to
> > check if tests can pass on your branch because our GitHub Actions
> > workflows automatically run tests for your pull request/following
> > commits and every run burdens the limited resources of GitHub Actions in
> > Apache Spark repository. "
> >
> > In Pulsar, we will need to do the same. As a solution to this, Tison
> > suggested that we would not run all tests for the PR unless there's a
> > "ready-to-test" label on the PR.
> >
> > I think this is a good suggestion. We could extend the existing
> > "pulsarbot" to help with the automation.
> >
> > A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> > pulsarbot would add the label and also restart the CI workflow to make
> > it proceed and run the tests.
> > pulsarbot would check for authorized users. One simple
> > approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> > repository with the relevant information:
> >
> > committer_github_ids:
> >   - committer1
> >   - committer2
> >   ...
> >
> > ready_to_test:
> >   authorized_github_ids:
> >     - userid1
> >     - userid2
> >     ...
> >
> > We would have a script to synchronize all Pulsar committers to this file
> > peridiotically (manual step after there's a new committer). ASF provides
> > public json files for project members at
> > https://whimsy.apache.org/public/public_ldap_projects.json , however the
> > mapping to github user names seems to be missing. That could be done
> > with a custom script since ASF LDAP contains the github username.
> >
> > All Pulsar committers would have access. In addition, there could be other
> > users that are authorized for using "/pulsarbot ready-to-test".
> >
> > This solution would also require changes in the GitHub Actions workflows
> > so that the workflow is failed in an early step unless there's a
> > ready-to-test label for the PR.
> >
> > With the above solution, we would be able to cut the amount of
> > unnecessary builds and get the excessive resource consumption issue
> > under control. The PR authors would be instructed to run initial PR
> > builds in their own fork and the reviewer should check that this is done
> > before approving the PR for testing with "/pulsarbot ready-to-test".
> >
> > I would suggest proceeding quickly on this matter without separate PIPs
> > or votes. We could follow the Apache lazy consensus
> > (https://community.apache.org/committers/lazyConsensus.html) principle
> > and make this happen if there aren't objections in the next 72 hours.
> > The improvement suggestions to this proposal would obviously be taken
> > into account and if someone objects, we wouldn't have reached lazy
> > consensus and we wouldn't proceed.
> >
> > -Lari
> >
> >
> > 1 -
> > https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> >
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Posted by Yubiao Feng <yu...@streamnative.io.INVALID>.

Hi Lari:

That is really a good way.
I think it is possible to add another button to cancel the running task.
because after the user submits the PR, he finds other problems that need to
be fixed. In this case, he can cancel the task by himself.

Hi tison:

> we can start with all "authorized" users to trigger the CI

I think all contributors need to get permission. If only commiters have
permission, this will hurt the enthusiasm of community contributors. Almost
PR submissions are submitted by mature contributors, and they will follow
the rules to save resources

Thanks
Yubiao Feng

On Thu, Sep 15, 2022 at 4:36 PM Lari Hotari <lh...@apache.org> wrote:

> Hi all,
>
> The GitHub Actions based Pulsar CI has been experiencing issues for
> multiple weeks. The condition is currently better, but the resource
> shortage issue remains. CI builds will take a long time to complete even
> after many optimizations have been made.
>
> There's a long email thread with some details about the past issues:
> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
>
> I have filed an issue to GitHub support about the CI issues over a week
> ago, and I finally received an answer a few hours ago. However the
> GitHub support person didn't reply to my questions at all, but instead
> suggested that there's a beta program where it's possible to pay for
> more resources. That solution isn't suitable for our case, since it
> doesn't seem to be possible to assign GitHub Actions Runner VM resources
> only for a specific Apache project. I'll follow up with GitHub support, but
> I don't expect that to resolve our problems in the near term. We need
> to make changes in our CI resource consumption.
>
> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> suggested: "Apache Spark project requires that all PRs are executed in
> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
>
> The Apache Spark contributing guide contains details about this in the
> "Pull request" section, https://spark.apache.org/contributing.html .
>
> "Before creating a pull request in Apache Spark, it is important to
> check if tests can pass on your branch because our GitHub Actions
> workflows automatically run tests for your pull request/following
> commits and every run burdens the limited resources of GitHub Actions in
> Apache Spark repository. "
>
> In Pulsar, we will need to do the same. As a solution to this, Tison
> suggested that we would not run all tests for the PR unless there's a
> "ready-to-test" label on the PR.
>
> I think this is a good suggestion. We could extend the existing
> "pulsarbot" to help with the automation.
>
> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> pulsarbot would add the label and also restart the CI workflow to make
> it proceed and run the tests.
> pulsarbot would check for authorized users. One simple
> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> repository with the relevant information:
>
> committer_github_ids:
>   - committer1
>   - committer2
>   ...
>
> ready_to_test:
>   authorized_github_ids:
>     - userid1
>     - userid2
>     ...
>
> We would have a script to synchronize all Pulsar committers to this file
> peridiotically (manual step after there's a new committer). ASF provides
> public json files for project members at
> https://whimsy.apache.org/public/public_ldap_projects.json , however the
> mapping to github user names seems to be missing. That could be done
> with a custom script since ASF LDAP contains the github username.
>
> All Pulsar committers would have access. In addition, there could be other
> users that are authorized for using "/pulsarbot ready-to-test".
>
> This solution would also require changes in the GitHub Actions workflows
> so that the workflow is failed in an early step unless there's a
> ready-to-test label for the PR.
>
> With the above solution, we would be able to cut the amount of
> unnecessary builds and get the excessive resource consumption issue
> under control. The PR authors would be instructed to run initial PR
> builds in their own fork and the reviewer should check that this is done
> before approving the PR for testing with "/pulsarbot ready-to-test".
>
> I would suggest proceeding quickly on this matter without separate PIPs
> or votes. We could follow the Apache lazy consensus
> (https://community.apache.org/committers/lazyConsensus.html) principle
> and make this happen if there aren't objections in the next 72 hours.
> The improvement suggestions to this proposal would obviously be taken
> into account and if someone objects, we wouldn't have reached lazy
> consensus and we wouldn't proceed.
>
> -Lari
>
>
> 1 -
> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
>