You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by Enrico Olivelli <eo...@gmail.com> on 2020/10/31 16:00:33 UTC

Pull Request Validation on GitHub Actions (ZOOKEEPER-3973)

Hello,
Christopher send a patch that enabled PR validation in GitHub Actions [1]

I would like to start a discussion and explain what's going on.
I was talking with Andor about the lack of the "magic words" on Pull
Request Validation that restart the build on the new Jenkins.

I cited that in Apache BookKeeper and in Apache Pulsar we have a "github
bot" that interacts with the comments in the Pull Requests and it is able
to rerun the failure builds.
This kind of "bots" is becoming pretty common on github.

Christopher followed up our discussion and created that patch.
You can't see GH Action working on ZK repo, because we should enable them
with the help of Infra.

From my point of view using GitHub Actions will be interesting and useful
if and only if we add the 'bot' that reruns the failures.

This is not possible on the new ASF Jenkins, so only committers (I am not
sure that you also need some special additional karma) can trigger a new
build by logging into Jenkins.

The work of Christopher is not only about enabling GitHub Actions but it is
also about cleaning up the validation process and running only part of the
tests, this can be discussed on a separate thread (I would like to run all
of the tests for instance, not only a selection). So please comment on the
PR about the changes.

Here the discussion is more about
"Use GH actions and rerun the tests" (option 1)
vs
 "Use ASFK jenkins for everything + Travis for PRs on additional
architectures" (option 2, that is to keep the current situation)

Regards
Enrico
[1] https://github.com/apache/zookeeper/pull/1508

Re: Pull Request Validation on GitHub Actions (ZOOKEEPER-3973)

Posted by Christopher <ct...@apache.org>.

I apologize if I interpreted the error messages incorrectly regarding port
binding. I'm just speculating based on what I've seen in the common error
messages ("java.net.BindException: Address already in use"). When I looked
at it before, I didn't see the PortAssignment code, but I have very little
knowledge of Zookeeper's testing framework code. I wonder if the problem is
that PortAssignment doesn't "s.setReuseAddress(true)", and the `s.close()`
doesn't release it fast enough because it's running in a container?

We can add the test logs on failure. I didn't do that before because I
didn't know what other file patterns would be important to grab. So, the
initial configuration only grabs the surefire reports. The fact that they
are zipped is what GitHub Actions does by itself. The fact that they are
XML is because that's the format Surefire stores the reports in. The
surefire reports, along with the console output is what you'd get from
Jenkins by default.

If you know what you want to grab, you can add additional file patterns to
upload on failed builds to troubleshoot further. I can probably help with
that part, if you let me know what file patterns I should try to capture.

On Fri, Jan 15, 2021 at 10:25 AM Andor Molnar <an...@apache.org> wrote:

> It’s not easy to debug the failing builds:
>
> Unit tests results cannot be uploaded for some reason and we only have
> surefire-reports.zip in the artifacts which contains all the test results
> zipped in XML format.
>
> Test logs are completely missing or I cannot find them.
>
> Andor
>
>
>
>
> > On 2021. Jan 15., at 16:14, Andor Molnar <an...@apache.org> wrote:
> >
> > That’s not entirely true.
> >
> > First, we have PortAssignment class which is responsible for assigning
> unique port numbers to test instances. It also has a retry logic to
> increment port numbers which have failed to bind.
> > Second, ASF Jenkins running the same test suite and manages to get green
> builds every now and then by running tests parallel on 4 threads.
> >
> > There must be some reason for that Github builds are much more unstable.
> > Do they fail always with the same error?
> >
> > Andor
> >
> >
> >
> >> On 2021. Jan 15., at 4:18, Christopher <ct...@apache.org> wrote:
> >>
> >> On Thu, Jan 14, 2021 at 3:52 PM Andor Molnar <an...@apache.org> wrote:
> >>
> >>> Nicely done guys. I've just noticed the changes. :)
> >>>
> >>> Does it work out well so far? I can only see failing builds here: (at
> >>> least 1 test of the matrix always fails)
> >>>
> >>> https://github.com/apache/zookeeper/actions
> >>>
> >>>
> >> The CI build for PR https://github.com/apache/zookeeper/pull/1579 looks
> >> useful, because it catches new checkstyle errors as a result of the
> >> proposed change:
> >> https://github.com/apache/zookeeper/runs/1705766219
> >>
> >> The one matrix item that always fails is the one that runs all the Java
> >> tests. There are a lot of the JUnit Java tests that are flaky. Many of
> them
> >> are probably just failing because of collisions with other tests (port
> >> reuse is a common one). Many of the tests reuse the same port numbers
> for
> >> the integration test, so they can't run at the same time, but the test
> >> cases don't check for port in use or wait for it to be free, and the
> tests
> >> aren't designed to select a random or free port, either. A lot of these
> >> could be fixed with improvements to the testing code, so that port
> numbers
> >> don't collide when tests are run concurrently.
> >>
> >> There are other flaky tests, too, but I'm not that familiar with any of
> >> these tests, just reading the error messages.
> >>
> >>
> >>> I'm not saying that our ASF Jenkins build is rock solid, but at least
> >>> it's having green runs every now and then.
> >>>
> >>>
> >>>
> https://ci-hadoop.apache.org/view/ZooKeeper/job/zookeeper-multi-branch-build/
> >>>
> >>>
> >> I suspect some of these flaky Java test cases probably fail on Jenkins a
> >> lot, too. If they can be tweaked to run more reliably in GitHub
> Actions, I
> >> bet the Jenkins precommit jobs would also pass more reliably, too.
> >>
> >>
> >>> Andor
> >>>
> >>>
> >>>
> >>> On Sun, 2020-11-01 at 00:31 -0400, Christopher wrote:
> >>>> Hi Enrico, et al,
> >>>>
> >>>> I think the situation is a little more complicated than simply
> >>>> selecting between "option 1" and "option 2". So, please allow me to
> >>>> provide some additional clarification about what my PR does and does
> >>>> not do (responses inline, see below).
> >>>>
> >>>> On Sat, Oct 31, 2020 at 12:00 PM Enrico Olivelli
> >>>> <eo...@gmail.com> wrote:
> >>>>>
> >>>>> Hello,
> >>>>> Christopher send a patch that enabled PR validation in GitHub
> >>>>> Actions [1]
> >>>>>
> >>>>> I would like to start a discussion and explain what's going on.
> >>>>> I was talking with Andor about the lack of the "magic words" on
> >>>>> Pull
> >>>>> Request Validation that restart the build on the new Jenkins.
> >>>>>
> >>>>> I cited that in Apache BookKeeper and in Apache Pulsar we have a
> >>>>> "github
> >>>>> bot" that interacts with the comments in the Pull Requests and it
> >>>>> is able
> >>>>> to rerun the failure builds.
> >>>>> This kind of "bots" is becoming pretty common on github.
> >>>>>
> >>>>> Christopher followed up our discussion and created that patch.
> >>>>
> >>>> To be clear, the GitHub Actions doesn't enable any "magic words" to
> >>>> restart, but it does provide greater control over re-build from the
> >>>> GitHub UI. It's at most a few clicks away.
> >>>>
> >>>>> You can't see GH Action working on ZK repo, because we should
> >>>>> enable them
> >>>>> with the help of Infra.
> >>>>
> >>>> To clarify here as well, INFRA isn't involved in enabling GH Actions.
> >>>> It's simply a matter of adding the workflow to the repository (which
> >>>> my PR does). It is automatically enabled once the workflow is merged
> >>>> in.
> >>>>
> >>>>>
> >>>>> From my point of view using GitHub Actions will be interesting and
> >>>>> useful
> >>>>> if and only if we add the 'bot' that reruns the failures.
> >>>>
> >>>> I think there's other value to consider, also... like replacing
> >>>> Travis
> >>>> CI (which asks for too many permissions IMO), and having much greater
> >>>> control over builds, including having access to surefire reports from
> >>>> failed tests, using custom containers, and having access to thousands
> >>>> of user-generated recipes to perform custom steps in the build. There
> >>>> are multiple benefits.
> >>>>
> >>>>>
> >>>>> This is not possible on the new ASF Jenkins, so only committers (I
> >>>>> am not
> >>>>> sure that you also need some special additional karma) can trigger
> >>>>> a new
> >>>>> build by logging into Jenkins.
> >>>>
> >>>> Unfortunately, that's still true of GH Actions... only committers
> >>>> would be able to trigger a rebuild in the GitHub UI for the build
> >>>> that
> >>>> was automatically triggered when the PR was created (or updated).
> >>>> Non-committers can only retrigger by updating the PR, just like they
> >>>> do now with Jenkins. However, the one advantage that non-committers
> >>>> do
> >>>> get is that they can execute the same GH Actions builds in their own
> >>>> repo, without needing an account on Jenkins or Travis, during their
> >>>> development process, before they even issue a PR. So, they will be
> >>>> able to see if their branch is likely to get a green build before
> >>>> they
> >>>> even open a PR (if they take advantage of that option).
> >>>>
> >>>>>
> >>>>> The work of Christopher is not only about enabling GitHub Actions
> >>>>> but it is
> >>>>> also about cleaning up the validation process and running only part
> >>>>> of the
> >>>>> tests, this can be discussed on a separate thread (I would like to
> >>>>> run all
> >>>>> of the tests for instance, not only a selection). So please comment
> >>>>> on the
> >>>>> PR about the changes.
> >>>>
> >>>> Running only part of the tests was not my goal. I just did that as an
> >>>> example to show what was possible. I can change it to try to run the
> >>>> full build. However, I know that currently doesn't work (too many
> >>>> tests fail in the GH Actions runner, just like too many failed in
> >>>> Travis, which is why they were disabled there). Either way, follow-on
> >>>> work will need to be done to tweak the tests so they can build
> >>>> reliably in this environment and eventually replace the Jenkins
> >>>> precommit validation. This PR is not yet ready to replace the need
> >>>> for
> >>>> the Jenkins precommit. It only lays the groundwork for getting there
> >>>> eventually.
> >>>>
> >>>> What the PR does now is:
> >>>> 1. Replace Travis and run *some* tests (Travis ran none)
> >>>> 2. Demonstrates how to run multiple tasks in a matrix build
> >>>> 3. Lays the groundwork for incremental test improvements so the
> >>>> Jenkins precommit job can also be replaced by GH Actions validations,
> >>>> so all validations are (eventually) managed in one place.
> >>>>
> >>>> The PR can still be changed to do whatever you want. I've merely
> >>>> layed
> >>>> down the groundwork that can be built upon. If you want it to try to
> >>>> run all the tests, I can do that. If you merely want it to replace
> >>>> Travis and build no tests (yet), I can do that. If you want me to add
> >>>> jdk8 to the matrix build instead of just jdk11, I can do that. If you
> >>>> want to have a build with jdk11, but run the full ITs with jdk8, I
> >>>> can
> >>>> do that. If you want to try to run the full ITs with both jdk8 and
> >>>> jdk11, I can do that. Just let me know what you want and I will
> >>>> update
> >>>> the PR accordingly.
> >>>>
> >>>> In essence, there are *far* more options available to the project
> >>>> than
> >>>> the "option 1" and "option 2" described below. It is the enormous
> >>>> possibility space that GH Actions opens up, that I think is the real
> >>>> advantage to the project, far more than any other advantages.
> >>>>
> >>>>>
> >>>>> Here the discussion is more about
> >>>>> "Use GH actions and rerun the tests" (option 1)
> >>>>> vs
> >>>>> "Use ASFK jenkins for everything + Travis for PRs on additional
> >>>>> architectures" (option 2, that is to keep the current situation)
> >>>>>
> >>>>
> >>>> My PR does keep the Travis build for s390x builds (without tests),
> >>>> because getting that arch building in GH Actions is tricky, so I
> >>>> didn't try it on my first attempt. It may be possible to get that to
> >>>> build also.
> >>>>
> >>>>> Regards
> >>>>> Enrico
> >>>>> [1] https://github.com/apache/zookeeper/pull/1508
> >>>>
> >>>> I hope that clarifies a little bit of what I was trying to accomplish
> >>>> in my PR, since my intentions may not be identical to what others
> >>>> were
> >>>> hoping it would achieve. Feel free to ask me additional questions, or
> >>>> to direct me to make changes to my PR in any way. I realize that this
> >>>> is something that may not get merged right away (or ever)...
> >>>> especially if it doesn't achieve what you were hoping it would
> >>>> achieve. But, I still think there's advantages to using GH Actions
> >>>> that are worth considering.
> >>>>
> >>>> Thanks,
> >>>> Christopher
> >>>
> >>>
> >>>
> >
>
>

Re: Pull Request Validation on GitHub Actions (ZOOKEEPER-3973)

Posted by Andor Molnar <an...@apache.org>.

It’s not easy to debug the failing builds:

Unit tests results cannot be uploaded for some reason and we only have surefire-reports.zip in the artifacts which contains all the test results zipped in XML format.

Test logs are completely missing or I cannot find them.

Andor




> On 2021. Jan 15., at 16:14, Andor Molnar <an...@apache.org> wrote:
> 
> That’s not entirely true.
> 
> First, we have PortAssignment class which is responsible for assigning unique port numbers to test instances. It also has a retry logic to increment port numbers which have failed to bind.
> Second, ASF Jenkins running the same test suite and manages to get green builds every now and then by running tests parallel on 4 threads.
> 
> There must be some reason for that Github builds are much more unstable. 
> Do they fail always with the same error?
> 
> Andor
> 
> 
> 
>> On 2021. Jan 15., at 4:18, Christopher <ct...@apache.org> wrote:
>> 
>> On Thu, Jan 14, 2021 at 3:52 PM Andor Molnar <an...@apache.org> wrote:
>> 
>>> Nicely done guys. I've just noticed the changes. :)
>>> 
>>> Does it work out well so far? I can only see failing builds here: (at
>>> least 1 test of the matrix always fails)
>>> 
>>> https://github.com/apache/zookeeper/actions
>>> 
>>> 
>> The CI build for PR https://github.com/apache/zookeeper/pull/1579 looks
>> useful, because it catches new checkstyle errors as a result of the
>> proposed change:
>> https://github.com/apache/zookeeper/runs/1705766219
>> 
>> The one matrix item that always fails is the one that runs all the Java
>> tests. There are a lot of the JUnit Java tests that are flaky. Many of them
>> are probably just failing because of collisions with other tests (port
>> reuse is a common one). Many of the tests reuse the same port numbers for
>> the integration test, so they can't run at the same time, but the test
>> cases don't check for port in use or wait for it to be free, and the tests
>> aren't designed to select a random or free port, either. A lot of these
>> could be fixed with improvements to the testing code, so that port numbers
>> don't collide when tests are run concurrently.
>> 
>> There are other flaky tests, too, but I'm not that familiar with any of
>> these tests, just reading the error messages.
>> 
>> 
>>> I'm not saying that our ASF Jenkins build is rock solid, but at least
>>> it's having green runs every now and then.
>>> 
>>> 
>>> https://ci-hadoop.apache.org/view/ZooKeeper/job/zookeeper-multi-branch-build/
>>> 
>>> 
>> I suspect some of these flaky Java test cases probably fail on Jenkins a
>> lot, too. If they can be tweaked to run more reliably in GitHub Actions, I
>> bet the Jenkins precommit jobs would also pass more reliably, too.
>> 
>> 
>>> Andor
>>> 
>>> 
>>> 
>>> On Sun, 2020-11-01 at 00:31 -0400, Christopher wrote:
>>>> Hi Enrico, et al,
>>>> 
>>>> I think the situation is a little more complicated than simply
>>>> selecting between "option 1" and "option 2". So, please allow me to
>>>> provide some additional clarification about what my PR does and does
>>>> not do (responses inline, see below).
>>>> 
>>>> On Sat, Oct 31, 2020 at 12:00 PM Enrico Olivelli
>>>> <eo...@gmail.com> wrote:
>>>>> 
>>>>> Hello,
>>>>> Christopher send a patch that enabled PR validation in GitHub
>>>>> Actions [1]
>>>>> 
>>>>> I would like to start a discussion and explain what's going on.
>>>>> I was talking with Andor about the lack of the "magic words" on
>>>>> Pull
>>>>> Request Validation that restart the build on the new Jenkins.
>>>>> 
>>>>> I cited that in Apache BookKeeper and in Apache Pulsar we have a
>>>>> "github
>>>>> bot" that interacts with the comments in the Pull Requests and it
>>>>> is able
>>>>> to rerun the failure builds.
>>>>> This kind of "bots" is becoming pretty common on github.
>>>>> 
>>>>> Christopher followed up our discussion and created that patch.
>>>> 
>>>> To be clear, the GitHub Actions doesn't enable any "magic words" to
>>>> restart, but it does provide greater control over re-build from the
>>>> GitHub UI. It's at most a few clicks away.
>>>> 
>>>>> You can't see GH Action working on ZK repo, because we should
>>>>> enable them
>>>>> with the help of Infra.
>>>> 
>>>> To clarify here as well, INFRA isn't involved in enabling GH Actions.
>>>> It's simply a matter of adding the workflow to the repository (which
>>>> my PR does). It is automatically enabled once the workflow is merged
>>>> in.
>>>> 
>>>>> 
>>>>> From my point of view using GitHub Actions will be interesting and
>>>>> useful
>>>>> if and only if we add the 'bot' that reruns the failures.
>>>> 
>>>> I think there's other value to consider, also... like replacing
>>>> Travis
>>>> CI (which asks for too many permissions IMO), and having much greater
>>>> control over builds, including having access to surefire reports from
>>>> failed tests, using custom containers, and having access to thousands
>>>> of user-generated recipes to perform custom steps in the build. There
>>>> are multiple benefits.
>>>> 
>>>>> 
>>>>> This is not possible on the new ASF Jenkins, so only committers (I
>>>>> am not
>>>>> sure that you also need some special additional karma) can trigger
>>>>> a new
>>>>> build by logging into Jenkins.
>>>> 
>>>> Unfortunately, that's still true of GH Actions... only committers
>>>> would be able to trigger a rebuild in the GitHub UI for the build
>>>> that
>>>> was automatically triggered when the PR was created (or updated).
>>>> Non-committers can only retrigger by updating the PR, just like they
>>>> do now with Jenkins. However, the one advantage that non-committers
>>>> do
>>>> get is that they can execute the same GH Actions builds in their own
>>>> repo, without needing an account on Jenkins or Travis, during their
>>>> development process, before they even issue a PR. So, they will be
>>>> able to see if their branch is likely to get a green build before
>>>> they
>>>> even open a PR (if they take advantage of that option).
>>>> 
>>>>> 
>>>>> The work of Christopher is not only about enabling GitHub Actions
>>>>> but it is
>>>>> also about cleaning up the validation process and running only part
>>>>> of the
>>>>> tests, this can be discussed on a separate thread (I would like to
>>>>> run all
>>>>> of the tests for instance, not only a selection). So please comment
>>>>> on the
>>>>> PR about the changes.
>>>> 
>>>> Running only part of the tests was not my goal. I just did that as an
>>>> example to show what was possible. I can change it to try to run the
>>>> full build. However, I know that currently doesn't work (too many
>>>> tests fail in the GH Actions runner, just like too many failed in
>>>> Travis, which is why they were disabled there). Either way, follow-on
>>>> work will need to be done to tweak the tests so they can build
>>>> reliably in this environment and eventually replace the Jenkins
>>>> precommit validation. This PR is not yet ready to replace the need
>>>> for
>>>> the Jenkins precommit. It only lays the groundwork for getting there
>>>> eventually.
>>>> 
>>>> What the PR does now is:
>>>> 1. Replace Travis and run *some* tests (Travis ran none)
>>>> 2. Demonstrates how to run multiple tasks in a matrix build
>>>> 3. Lays the groundwork for incremental test improvements so the
>>>> Jenkins precommit job can also be replaced by GH Actions validations,
>>>> so all validations are (eventually) managed in one place.
>>>> 
>>>> The PR can still be changed to do whatever you want. I've merely
>>>> layed
>>>> down the groundwork that can be built upon. If you want it to try to
>>>> run all the tests, I can do that. If you merely want it to replace
>>>> Travis and build no tests (yet), I can do that. If you want me to add
>>>> jdk8 to the matrix build instead of just jdk11, I can do that. If you
>>>> want to have a build with jdk11, but run the full ITs with jdk8, I
>>>> can
>>>> do that. If you want to try to run the full ITs with both jdk8 and
>>>> jdk11, I can do that. Just let me know what you want and I will
>>>> update
>>>> the PR accordingly.
>>>> 
>>>> In essence, there are *far* more options available to the project
>>>> than
>>>> the "option 1" and "option 2" described below. It is the enormous
>>>> possibility space that GH Actions opens up, that I think is the real
>>>> advantage to the project, far more than any other advantages.
>>>> 
>>>>> 
>>>>> Here the discussion is more about
>>>>> "Use GH actions and rerun the tests" (option 1)
>>>>> vs
>>>>> "Use ASFK jenkins for everything + Travis for PRs on additional
>>>>> architectures" (option 2, that is to keep the current situation)
>>>>> 
>>>> 
>>>> My PR does keep the Travis build for s390x builds (without tests),
>>>> because getting that arch building in GH Actions is tricky, so I
>>>> didn't try it on my first attempt. It may be possible to get that to
>>>> build also.
>>>> 
>>>>> Regards
>>>>> Enrico
>>>>> [1] https://github.com/apache/zookeeper/pull/1508
>>>> 
>>>> I hope that clarifies a little bit of what I was trying to accomplish
>>>> in my PR, since my intentions may not be identical to what others
>>>> were
>>>> hoping it would achieve. Feel free to ask me additional questions, or
>>>> to direct me to make changes to my PR in any way. I realize that this
>>>> is something that may not get merged right away (or ever)...
>>>> especially if it doesn't achieve what you were hoping it would
>>>> achieve. But, I still think there's advantages to using GH Actions
>>>> that are worth considering.
>>>> 
>>>> Thanks,
>>>> Christopher
>>> 
>>> 
>>> 
>

Re: Pull Request Validation on GitHub Actions (ZOOKEEPER-3973)

Posted by Andor Molnar <an...@apache.org>.

That’s not entirely true.

First, we have PortAssignment class which is responsible for assigning unique port numbers to test instances. It also has a retry logic to increment port numbers which have failed to bind.
Second, ASF Jenkins running the same test suite and manages to get green builds every now and then by running tests parallel on 4 threads.

There must be some reason for that Github builds are much more unstable. 
Do they fail always with the same error?

Andor



> On 2021. Jan 15., at 4:18, Christopher <ct...@apache.org> wrote:
> 
> On Thu, Jan 14, 2021 at 3:52 PM Andor Molnar <an...@apache.org> wrote:
> 
>> Nicely done guys. I've just noticed the changes. :)
>> 
>> Does it work out well so far? I can only see failing builds here: (at
>> least 1 test of the matrix always fails)
>> 
>> https://github.com/apache/zookeeper/actions
>> 
>> 
> The CI build for PR https://github.com/apache/zookeeper/pull/1579 looks
> useful, because it catches new checkstyle errors as a result of the
> proposed change:
> https://github.com/apache/zookeeper/runs/1705766219
> 
> The one matrix item that always fails is the one that runs all the Java
> tests. There are a lot of the JUnit Java tests that are flaky. Many of them
> are probably just failing because of collisions with other tests (port
> reuse is a common one). Many of the tests reuse the same port numbers for
> the integration test, so they can't run at the same time, but the test
> cases don't check for port in use or wait for it to be free, and the tests
> aren't designed to select a random or free port, either. A lot of these
> could be fixed with improvements to the testing code, so that port numbers
> don't collide when tests are run concurrently.
> 
> There are other flaky tests, too, but I'm not that familiar with any of
> these tests, just reading the error messages.
> 
> 
>> I'm not saying that our ASF Jenkins build is rock solid, but at least
>> it's having green runs every now and then.
>> 
>> 
>> https://ci-hadoop.apache.org/view/ZooKeeper/job/zookeeper-multi-branch-build/
>> 
>> 
> I suspect some of these flaky Java test cases probably fail on Jenkins a
> lot, too. If they can be tweaked to run more reliably in GitHub Actions, I
> bet the Jenkins precommit jobs would also pass more reliably, too.
> 
> 
>> Andor
>> 
>> 
>> 
>> On Sun, 2020-11-01 at 00:31 -0400, Christopher wrote:
>>> Hi Enrico, et al,
>>> 
>>> I think the situation is a little more complicated than simply
>>> selecting between "option 1" and "option 2". So, please allow me to
>>> provide some additional clarification about what my PR does and does
>>> not do (responses inline, see below).
>>> 
>>> On Sat, Oct 31, 2020 at 12:00 PM Enrico Olivelli
>>> <eo...@gmail.com> wrote:
>>>> 
>>>> Hello,
>>>> Christopher send a patch that enabled PR validation in GitHub
>>>> Actions [1]
>>>> 
>>>> I would like to start a discussion and explain what's going on.
>>>> I was talking with Andor about the lack of the "magic words" on
>>>> Pull
>>>> Request Validation that restart the build on the new Jenkins.
>>>> 
>>>> I cited that in Apache BookKeeper and in Apache Pulsar we have a
>>>> "github
>>>> bot" that interacts with the comments in the Pull Requests and it
>>>> is able
>>>> to rerun the failure builds.
>>>> This kind of "bots" is becoming pretty common on github.
>>>> 
>>>> Christopher followed up our discussion and created that patch.
>>> 
>>> To be clear, the GitHub Actions doesn't enable any "magic words" to
>>> restart, but it does provide greater control over re-build from the
>>> GitHub UI. It's at most a few clicks away.
>>> 
>>>> You can't see GH Action working on ZK repo, because we should
>>>> enable them
>>>> with the help of Infra.
>>> 
>>> To clarify here as well, INFRA isn't involved in enabling GH Actions.
>>> It's simply a matter of adding the workflow to the repository (which
>>> my PR does). It is automatically enabled once the workflow is merged
>>> in.
>>> 
>>>> 
>>>> From my point of view using GitHub Actions will be interesting and
>>>> useful
>>>> if and only if we add the 'bot' that reruns the failures.
>>> 
>>> I think there's other value to consider, also... like replacing
>>> Travis
>>> CI (which asks for too many permissions IMO), and having much greater
>>> control over builds, including having access to surefire reports from
>>> failed tests, using custom containers, and having access to thousands
>>> of user-generated recipes to perform custom steps in the build. There
>>> are multiple benefits.
>>> 
>>>> 
>>>> This is not possible on the new ASF Jenkins, so only committers (I
>>>> am not
>>>> sure that you also need some special additional karma) can trigger
>>>> a new
>>>> build by logging into Jenkins.
>>> 
>>> Unfortunately, that's still true of GH Actions... only committers
>>> would be able to trigger a rebuild in the GitHub UI for the build
>>> that
>>> was automatically triggered when the PR was created (or updated).
>>> Non-committers can only retrigger by updating the PR, just like they
>>> do now with Jenkins. However, the one advantage that non-committers
>>> do
>>> get is that they can execute the same GH Actions builds in their own
>>> repo, without needing an account on Jenkins or Travis, during their
>>> development process, before they even issue a PR. So, they will be
>>> able to see if their branch is likely to get a green build before
>>> they
>>> even open a PR (if they take advantage of that option).
>>> 
>>>> 
>>>> The work of Christopher is not only about enabling GitHub Actions
>>>> but it is
>>>> also about cleaning up the validation process and running only part
>>>> of the
>>>> tests, this can be discussed on a separate thread (I would like to
>>>> run all
>>>> of the tests for instance, not only a selection). So please comment
>>>> on the
>>>> PR about the changes.
>>> 
>>> Running only part of the tests was not my goal. I just did that as an
>>> example to show what was possible. I can change it to try to run the
>>> full build. However, I know that currently doesn't work (too many
>>> tests fail in the GH Actions runner, just like too many failed in
>>> Travis, which is why they were disabled there). Either way, follow-on
>>> work will need to be done to tweak the tests so they can build
>>> reliably in this environment and eventually replace the Jenkins
>>> precommit validation. This PR is not yet ready to replace the need
>>> for
>>> the Jenkins precommit. It only lays the groundwork for getting there
>>> eventually.
>>> 
>>> What the PR does now is:
>>> 1. Replace Travis and run *some* tests (Travis ran none)
>>> 2. Demonstrates how to run multiple tasks in a matrix build
>>> 3. Lays the groundwork for incremental test improvements so the
>>> Jenkins precommit job can also be replaced by GH Actions validations,
>>> so all validations are (eventually) managed in one place.
>>> 
>>> The PR can still be changed to do whatever you want. I've merely
>>> layed
>>> down the groundwork that can be built upon. If you want it to try to
>>> run all the tests, I can do that. If you merely want it to replace
>>> Travis and build no tests (yet), I can do that. If you want me to add
>>> jdk8 to the matrix build instead of just jdk11, I can do that. If you
>>> want to have a build with jdk11, but run the full ITs with jdk8, I
>>> can
>>> do that. If you want to try to run the full ITs with both jdk8 and
>>> jdk11, I can do that. Just let me know what you want and I will
>>> update
>>> the PR accordingly.
>>> 
>>> In essence, there are *far* more options available to the project
>>> than
>>> the "option 1" and "option 2" described below. It is the enormous
>>> possibility space that GH Actions opens up, that I think is the real
>>> advantage to the project, far more than any other advantages.
>>> 
>>>> 
>>>> Here the discussion is more about
>>>> "Use GH actions and rerun the tests" (option 1)
>>>> vs
>>>> "Use ASFK jenkins for everything + Travis for PRs on additional
>>>> architectures" (option 2, that is to keep the current situation)
>>>> 
>>> 
>>> My PR does keep the Travis build for s390x builds (without tests),
>>> because getting that arch building in GH Actions is tricky, so I
>>> didn't try it on my first attempt. It may be possible to get that to
>>> build also.
>>> 
>>>> Regards
>>>> Enrico
>>>> [1] https://github.com/apache/zookeeper/pull/1508
>>> 
>>> I hope that clarifies a little bit of what I was trying to accomplish
>>> in my PR, since my intentions may not be identical to what others
>>> were
>>> hoping it would achieve. Feel free to ask me additional questions, or
>>> to direct me to make changes to my PR in any way. I realize that this
>>> is something that may not get merged right away (or ever)...
>>> especially if it doesn't achieve what you were hoping it would
>>> achieve. But, I still think there's advantages to using GH Actions
>>> that are worth considering.
>>> 
>>> Thanks,
>>> Christopher
>> 
>> 
>>

Re: Pull Request Validation on GitHub Actions (ZOOKEEPER-3973)

Posted by Christopher <ct...@apache.org>.

On Thu, Jan 14, 2021 at 3:52 PM Andor Molnar <an...@apache.org> wrote:

> Nicely done guys. I've just noticed the changes. :)
>
> Does it work out well so far? I can only see failing builds here: (at
> least 1 test of the matrix always fails)
>
> https://github.com/apache/zookeeper/actions
>
>
The CI build for PR https://github.com/apache/zookeeper/pull/1579 looks
useful, because it catches new checkstyle errors as a result of the
proposed change:
https://github.com/apache/zookeeper/runs/1705766219

The one matrix item that always fails is the one that runs all the Java
tests. There are a lot of the JUnit Java tests that are flaky. Many of them
are probably just failing because of collisions with other tests (port
reuse is a common one). Many of the tests reuse the same port numbers for
the integration test, so they can't run at the same time, but the test
cases don't check for port in use or wait for it to be free, and the tests
aren't designed to select a random or free port, either. A lot of these
could be fixed with improvements to the testing code, so that port numbers
don't collide when tests are run concurrently.

There are other flaky tests, too, but I'm not that familiar with any of
these tests, just reading the error messages.


> I'm not saying that our ASF Jenkins build is rock solid, but at least
> it's having green runs every now and then.
>
>
> https://ci-hadoop.apache.org/view/ZooKeeper/job/zookeeper-multi-branch-build/
>
>
I suspect some of these flaky Java test cases probably fail on Jenkins a
lot, too. If they can be tweaked to run more reliably in GitHub Actions, I
bet the Jenkins precommit jobs would also pass more reliably, too.


> Andor
>
>
>
> On Sun, 2020-11-01 at 00:31 -0400, Christopher wrote:
> > Hi Enrico, et al,
> >
> > I think the situation is a little more complicated than simply
> > selecting between "option 1" and "option 2". So, please allow me to
> > provide some additional clarification about what my PR does and does
> > not do (responses inline, see below).
> >
> > On Sat, Oct 31, 2020 at 12:00 PM Enrico Olivelli
> > <eo...@gmail.com> wrote:
> > >
> > > Hello,
> > > Christopher send a patch that enabled PR validation in GitHub
> > > Actions [1]
> > >
> > > I would like to start a discussion and explain what's going on.
> > > I was talking with Andor about the lack of the "magic words" on
> > > Pull
> > > Request Validation that restart the build on the new Jenkins.
> > >
> > > I cited that in Apache BookKeeper and in Apache Pulsar we have a
> > > "github
> > > bot" that interacts with the comments in the Pull Requests and it
> > > is able
> > > to rerun the failure builds.
> > > This kind of "bots" is becoming pretty common on github.
> > >
> > > Christopher followed up our discussion and created that patch.
> >
> > To be clear, the GitHub Actions doesn't enable any "magic words" to
> > restart, but it does provide greater control over re-build from the
> > GitHub UI. It's at most a few clicks away.
> >
> > > You can't see GH Action working on ZK repo, because we should
> > > enable them
> > > with the help of Infra.
> >
> > To clarify here as well, INFRA isn't involved in enabling GH Actions.
> > It's simply a matter of adding the workflow to the repository (which
> > my PR does). It is automatically enabled once the workflow is merged
> > in.
> >
> > >
> > > From my point of view using GitHub Actions will be interesting and
> > > useful
> > > if and only if we add the 'bot' that reruns the failures.
> >
> > I think there's other value to consider, also... like replacing
> > Travis
> > CI (which asks for too many permissions IMO), and having much greater
> > control over builds, including having access to surefire reports from
> > failed tests, using custom containers, and having access to thousands
> > of user-generated recipes to perform custom steps in the build. There
> > are multiple benefits.
> >
> > >
> > > This is not possible on the new ASF Jenkins, so only committers (I
> > > am not
> > > sure that you also need some special additional karma) can trigger
> > > a new
> > > build by logging into Jenkins.
> >
> > Unfortunately, that's still true of GH Actions... only committers
> > would be able to trigger a rebuild in the GitHub UI for the build
> > that
> > was automatically triggered when the PR was created (or updated).
> > Non-committers can only retrigger by updating the PR, just like they
> > do now with Jenkins. However, the one advantage that non-committers
> > do
> > get is that they can execute the same GH Actions builds in their own
> > repo, without needing an account on Jenkins or Travis, during their
> > development process, before they even issue a PR. So, they will be
> > able to see if their branch is likely to get a green build before
> > they
> > even open a PR (if they take advantage of that option).
> >
> > >
> > > The work of Christopher is not only about enabling GitHub Actions
> > > but it is
> > > also about cleaning up the validation process and running only part
> > > of the
> > > tests, this can be discussed on a separate thread (I would like to
> > > run all
> > > of the tests for instance, not only a selection). So please comment
> > > on the
> > > PR about the changes.
> >
> > Running only part of the tests was not my goal. I just did that as an
> > example to show what was possible. I can change it to try to run the
> > full build. However, I know that currently doesn't work (too many
> > tests fail in the GH Actions runner, just like too many failed in
> > Travis, which is why they were disabled there). Either way, follow-on
> > work will need to be done to tweak the tests so they can build
> > reliably in this environment and eventually replace the Jenkins
> > precommit validation. This PR is not yet ready to replace the need
> > for
> > the Jenkins precommit. It only lays the groundwork for getting there
> > eventually.
> >
> > What the PR does now is:
> > 1. Replace Travis and run *some* tests (Travis ran none)
> > 2. Demonstrates how to run multiple tasks in a matrix build
> > 3. Lays the groundwork for incremental test improvements so the
> > Jenkins precommit job can also be replaced by GH Actions validations,
> > so all validations are (eventually) managed in one place.
> >
> > The PR can still be changed to do whatever you want. I've merely
> > layed
> > down the groundwork that can be built upon. If you want it to try to
> > run all the tests, I can do that. If you merely want it to replace
> > Travis and build no tests (yet), I can do that. If you want me to add
> > jdk8 to the matrix build instead of just jdk11, I can do that. If you
> > want to have a build with jdk11, but run the full ITs with jdk8, I
> > can
> > do that. If you want to try to run the full ITs with both jdk8 and
> > jdk11, I can do that. Just let me know what you want and I will
> > update
> > the PR accordingly.
> >
> > In essence, there are *far* more options available to the project
> > than
> > the "option 1" and "option 2" described below. It is the enormous
> > possibility space that GH Actions opens up, that I think is the real
> > advantage to the project, far more than any other advantages.
> >
> > >
> > > Here the discussion is more about
> > > "Use GH actions and rerun the tests" (option 1)
> > > vs
> > >  "Use ASFK jenkins for everything + Travis for PRs on additional
> > > architectures" (option 2, that is to keep the current situation)
> > >
> >
> > My PR does keep the Travis build for s390x builds (without tests),
> > because getting that arch building in GH Actions is tricky, so I
> > didn't try it on my first attempt. It may be possible to get that to
> > build also.
> >
> > > Regards
> > > Enrico
> > > [1] https://github.com/apache/zookeeper/pull/1508
> >
> > I hope that clarifies a little bit of what I was trying to accomplish
> > in my PR, since my intentions may not be identical to what others
> > were
> > hoping it would achieve. Feel free to ask me additional questions, or
> > to direct me to make changes to my PR in any way. I realize that this
> > is something that may not get merged right away (or ever)...
> > especially if it doesn't achieve what you were hoping it would
> > achieve. But, I still think there's advantages to using GH Actions
> > that are worth considering.
> >
> > Thanks,
> > Christopher
>
>
>

Re: Pull Request Validation on GitHub Actions (ZOOKEEPER-3973)

Posted by Andor Molnar <an...@apache.org>.

Nicely done guys. I've just noticed the changes. :)

Does it work out well so far? I can only see failing builds here: (at
least 1 test of the matrix always fails)

https://github.com/apache/zookeeper/actions

I'm not saying that our ASF Jenkins build is rock solid, but at least
it's having green runs every now and then.

https://ci-hadoop.apache.org/view/ZooKeeper/job/zookeeper-multi-branch-build/

Andor



On Sun, 2020-11-01 at 00:31 -0400, Christopher wrote:
> Hi Enrico, et al,
> 
> I think the situation is a little more complicated than simply
> selecting between "option 1" and "option 2". So, please allow me to
> provide some additional clarification about what my PR does and does
> not do (responses inline, see below).
> 
> On Sat, Oct 31, 2020 at 12:00 PM Enrico Olivelli
> <eo...@gmail.com> wrote:
> > 
> > Hello,
> > Christopher send a patch that enabled PR validation in GitHub
> > Actions [1]
> > 
> > I would like to start a discussion and explain what's going on.
> > I was talking with Andor about the lack of the "magic words" on
> > Pull
> > Request Validation that restart the build on the new Jenkins.
> > 
> > I cited that in Apache BookKeeper and in Apache Pulsar we have a
> > "github
> > bot" that interacts with the comments in the Pull Requests and it
> > is able
> > to rerun the failure builds.
> > This kind of "bots" is becoming pretty common on github.
> > 
> > Christopher followed up our discussion and created that patch.
> 
> To be clear, the GitHub Actions doesn't enable any "magic words" to
> restart, but it does provide greater control over re-build from the
> GitHub UI. It's at most a few clicks away.
> 
> > You can't see GH Action working on ZK repo, because we should
> > enable them
> > with the help of Infra.
> 
> To clarify here as well, INFRA isn't involved in enabling GH Actions.
> It's simply a matter of adding the workflow to the repository (which
> my PR does). It is automatically enabled once the workflow is merged
> in.
> 
> > 
> > From my point of view using GitHub Actions will be interesting and
> > useful
> > if and only if we add the 'bot' that reruns the failures.
> 
> I think there's other value to consider, also... like replacing
> Travis
> CI (which asks for too many permissions IMO), and having much greater
> control over builds, including having access to surefire reports from
> failed tests, using custom containers, and having access to thousands
> of user-generated recipes to perform custom steps in the build. There
> are multiple benefits.
> 
> > 
> > This is not possible on the new ASF Jenkins, so only committers (I
> > am not
> > sure that you also need some special additional karma) can trigger
> > a new
> > build by logging into Jenkins.
> 
> Unfortunately, that's still true of GH Actions... only committers
> would be able to trigger a rebuild in the GitHub UI for the build
> that
> was automatically triggered when the PR was created (or updated).
> Non-committers can only retrigger by updating the PR, just like they
> do now with Jenkins. However, the one advantage that non-committers
> do
> get is that they can execute the same GH Actions builds in their own
> repo, without needing an account on Jenkins or Travis, during their
> development process, before they even issue a PR. So, they will be
> able to see if their branch is likely to get a green build before
> they
> even open a PR (if they take advantage of that option).
> 
> > 
> > The work of Christopher is not only about enabling GitHub Actions
> > but it is
> > also about cleaning up the validation process and running only part
> > of the
> > tests, this can be discussed on a separate thread (I would like to
> > run all
> > of the tests for instance, not only a selection). So please comment
> > on the
> > PR about the changes.
> 
> Running only part of the tests was not my goal. I just did that as an
> example to show what was possible. I can change it to try to run the
> full build. However, I know that currently doesn't work (too many
> tests fail in the GH Actions runner, just like too many failed in
> Travis, which is why they were disabled there). Either way, follow-on
> work will need to be done to tweak the tests so they can build
> reliably in this environment and eventually replace the Jenkins
> precommit validation. This PR is not yet ready to replace the need
> for
> the Jenkins precommit. It only lays the groundwork for getting there
> eventually.
> 
> What the PR does now is:
> 1. Replace Travis and run *some* tests (Travis ran none)
> 2. Demonstrates how to run multiple tasks in a matrix build
> 3. Lays the groundwork for incremental test improvements so the
> Jenkins precommit job can also be replaced by GH Actions validations,
> so all validations are (eventually) managed in one place.
> 
> The PR can still be changed to do whatever you want. I've merely
> layed
> down the groundwork that can be built upon. If you want it to try to
> run all the tests, I can do that. If you merely want it to replace
> Travis and build no tests (yet), I can do that. If you want me to add
> jdk8 to the matrix build instead of just jdk11, I can do that. If you
> want to have a build with jdk11, but run the full ITs with jdk8, I
> can
> do that. If you want to try to run the full ITs with both jdk8 and
> jdk11, I can do that. Just let me know what you want and I will
> update
> the PR accordingly.
> 
> In essence, there are *far* more options available to the project
> than
> the "option 1" and "option 2" described below. It is the enormous
> possibility space that GH Actions opens up, that I think is the real
> advantage to the project, far more than any other advantages.
> 
> > 
> > Here the discussion is more about
> > "Use GH actions and rerun the tests" (option 1)
> > vs
> >  "Use ASFK jenkins for everything + Travis for PRs on additional
> > architectures" (option 2, that is to keep the current situation)
> > 
> 
> My PR does keep the Travis build for s390x builds (without tests),
> because getting that arch building in GH Actions is tricky, so I
> didn't try it on my first attempt. It may be possible to get that to
> build also.
> 
> > Regards
> > Enrico
> > [1] https://github.com/apache/zookeeper/pull/1508
> 
> I hope that clarifies a little bit of what I was trying to accomplish
> in my PR, since my intentions may not be identical to what others
> were
> hoping it would achieve. Feel free to ask me additional questions, or
> to direct me to make changes to my PR in any way. I realize that this
> is something that may not get merged right away (or ever)...
> especially if it doesn't achieve what you were hoping it would
> achieve. But, I still think there's advantages to using GH Actions
> that are worth considering.
> 
> Thanks,
> Christopher

Re: Pull Request Validation on GitHub Actions (ZOOKEEPER-3973)

Posted by Christopher <ct...@apache.org>.

Hi Enrico, et al,

I think the situation is a little more complicated than simply
selecting between "option 1" and "option 2". So, please allow me to
provide some additional clarification about what my PR does and does
not do (responses inline, see below).

On Sat, Oct 31, 2020 at 12:00 PM Enrico Olivelli <eo...@gmail.com> wrote:
>
> Hello,
> Christopher send a patch that enabled PR validation in GitHub Actions [1]
>
> I would like to start a discussion and explain what's going on.
> I was talking with Andor about the lack of the "magic words" on Pull
> Request Validation that restart the build on the new Jenkins.
>
> I cited that in Apache BookKeeper and in Apache Pulsar we have a "github
> bot" that interacts with the comments in the Pull Requests and it is able
> to rerun the failure builds.
> This kind of "bots" is becoming pretty common on github.
>
> Christopher followed up our discussion and created that patch.

To be clear, the GitHub Actions doesn't enable any "magic words" to
restart, but it does provide greater control over re-build from the
GitHub UI. It's at most a few clicks away.

> You can't see GH Action working on ZK repo, because we should enable them
> with the help of Infra.

To clarify here as well, INFRA isn't involved in enabling GH Actions.
It's simply a matter of adding the workflow to the repository (which
my PR does). It is automatically enabled once the workflow is merged
in.

>
> From my point of view using GitHub Actions will be interesting and useful
> if and only if we add the 'bot' that reruns the failures.

I think there's other value to consider, also... like replacing Travis
CI (which asks for too many permissions IMO), and having much greater
control over builds, including having access to surefire reports from
failed tests, using custom containers, and having access to thousands
of user-generated recipes to perform custom steps in the build. There
are multiple benefits.

>
> This is not possible on the new ASF Jenkins, so only committers (I am not
> sure that you also need some special additional karma) can trigger a new
> build by logging into Jenkins.

Unfortunately, that's still true of GH Actions... only committers
would be able to trigger a rebuild in the GitHub UI for the build that
was automatically triggered when the PR was created (or updated).
Non-committers can only retrigger by updating the PR, just like they
do now with Jenkins. However, the one advantage that non-committers do
get is that they can execute the same GH Actions builds in their own
repo, without needing an account on Jenkins or Travis, during their
development process, before they even issue a PR. So, they will be
able to see if their branch is likely to get a green build before they
even open a PR (if they take advantage of that option).

>
> The work of Christopher is not only about enabling GitHub Actions but it is
> also about cleaning up the validation process and running only part of the
> tests, this can be discussed on a separate thread (I would like to run all
> of the tests for instance, not only a selection). So please comment on the
> PR about the changes.

Running only part of the tests was not my goal. I just did that as an
example to show what was possible. I can change it to try to run the
full build. However, I know that currently doesn't work (too many
tests fail in the GH Actions runner, just like too many failed in
Travis, which is why they were disabled there). Either way, follow-on
work will need to be done to tweak the tests so they can build
reliably in this environment and eventually replace the Jenkins
precommit validation. This PR is not yet ready to replace the need for
the Jenkins precommit. It only lays the groundwork for getting there
eventually.

What the PR does now is:
1. Replace Travis and run *some* tests (Travis ran none)
2. Demonstrates how to run multiple tasks in a matrix build
3. Lays the groundwork for incremental test improvements so the
Jenkins precommit job can also be replaced by GH Actions validations,
so all validations are (eventually) managed in one place.

The PR can still be changed to do whatever you want. I've merely layed
down the groundwork that can be built upon. If you want it to try to
run all the tests, I can do that. If you merely want it to replace
Travis and build no tests (yet), I can do that. If you want me to add
jdk8 to the matrix build instead of just jdk11, I can do that. If you
want to have a build with jdk11, but run the full ITs with jdk8, I can
do that. If you want to try to run the full ITs with both jdk8 and
jdk11, I can do that. Just let me know what you want and I will update
the PR accordingly.

In essence, there are *far* more options available to the project than
the "option 1" and "option 2" described below. It is the enormous
possibility space that GH Actions opens up, that I think is the real
advantage to the project, far more than any other advantages.

>
> Here the discussion is more about
> "Use GH actions and rerun the tests" (option 1)
> vs
>  "Use ASFK jenkins for everything + Travis for PRs on additional
> architectures" (option 2, that is to keep the current situation)
>

My PR does keep the Travis build for s390x builds (without tests),
because getting that arch building in GH Actions is tricky, so I
didn't try it on my first attempt. It may be possible to get that to
build also.

> Regards
> Enrico
> [1] https://github.com/apache/zookeeper/pull/1508

I hope that clarifies a little bit of what I was trying to accomplish
in my PR, since my intentions may not be identical to what others were
hoping it would achieve. Feel free to ask me additional questions, or
to direct me to make changes to my PR in any way. I realize that this
is something that may not get merged right away (or ever)...
especially if it doesn't achieve what you were hoping it would
achieve. But, I still think there's advantages to using GH Actions
that are worth considering.

Thanks,
Christopher