You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Brian Hulette <bh...@google.com> on 2020/09/03 18:05:33 UTC

Re-running GitHub Actions jobs

The new GitHub Actions workflows that run Java and Python tests against
different targets (macos, ubuntu, windows) are great! But just like our
Jenkins infra they flake occasionally. Should we be re-running all of these
jobs until we get green runs?

Unfortunately it's not possible to re-run an individual job in a workflow
[1], the only option is to re-run all jobs, so flaky tests become even more
problematic.

I see two options:
1) Consider it "good enough" if just Jenkins CI passes and any GitHub
actions failures appear to be flakes.
2) Require that all Jenkins and GitHub checks pass.

My vote is for (2). (1) risks merging legitimate breakages, and one could
argue that making flaky tests extra painful is a good thing. Also we can
always make an exception if an obvious flake is blocking a critical PR.


Also FYI - at first I thought these workflows only had the stdout
available, but the test report directory is also zipped and uploaded as an
artifact. When a failure occurs you can download it to get the full output:
[image: image.png]


Brian

[1]
https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234

Re: Re-running GitHub Actions jobs

Posted by Valentyn Tymofieiev <va...@google.com>.

On Fri, Sep 18, 2020 at 9:53 AM Ahmet Altay <al...@google.com> wrote:

> We can ignore (or disable) macos tests for now. We have not been testing
> SDKs on macos since we transitioned out of Travis (2017?). It is possible
> that there are issues with our tests on macos. For example, we continued to
> test on Windows and regularly fixed test issues when running on Windows.
> Another possibility is that github actions have its own infra issues on
> MacOS. There are some reported and fixed issues [1] as recent as last month.
>
> [1] https://github.com/actions/virtual-environments/issues/736
>
> On Fri, Sep 18, 2020 at 9:24 AM Brian Hulette <bh...@google.com> wrote:
>
>> There are P1 jiras for the frequent flakes for Python on MacOS:
>> https://issues.apache.org/jira/browse/BEAM-10768
>> https://issues.apache.org/jira/browse/BEAM-10866
>>
>> On Thu, Sep 17, 2020 at 7:17 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> I'd be interested in figuring out which infra we intend to use for which
>>> signals. I think I am a bit out of the loop on this. I'm pretty sure we are
>>> redundantly running a lot of stuff.
>>>
>>
> GH actions has one major advantage on the platforms it offers out of the
> box (linux, windows, macos). At least for windows, and macos it makes sense
> to use GH actions.
>
> We are still trying out GH actions, and if it works for our purposes we
> could consider removing the redundancy by moving all our testing to it.
>
>
>>> Kenn
>>>
>>> On Thu, Sep 17, 2020 at 4:50 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> I rarely have a test where these don't flake out, even when I re-run
>>>> them multiple times. (The failures look completely irrelevant as well.) We
>>>> should probably file JIRAs to get them fixed (ideally) or disable them.
>>>>
>>>
> +1 to filing JIRAs. (Please cc: +Tyson Hamilton <ty...@google.com> on
> github action issues, and +Valentyn Tymofieiev <va...@google.com> on
> python test flakes. They can help.)
>

When you encounter a flake, please:
1) Open or find a JIRA
2) Tag the flakes  with currently-failing label [1].
3) Feel free to comment or vote on the JIRAs so that we could prioritize
the flakes.

Thanks

[1]
https://issues.apache.org/jira/issues/?jql=labels%20%3D%20currently-failing%20AND%20resolution%20%3D%20Unresolved


>
>
>>
>>>> On Thu, Sep 17, 2020 at 3:36 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>> Just wanted to solicit more opinions on this as these tests are still
>>>>> pretty flaky. I think we should be giving flakes in these workflows more
>>>>> attention, since they could represent legitimate bugs on platforms we don't
>>>>> test thoroughly.
>>>>>
>>>>> Brian
>>>>>
>>>>> On Thu, Sep 3, 2020 at 12:28 PM Heejong Lee <he...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I couldn't see that menu. Probably it needs a certain permission.
>>>>>>
>>>>>> If ordinary contributors could not re-run the tests by themselves,
>>>>>> option (2) might slow down the merging process since someone with the
>>>>>> permission should manually retrigger failed flaky tests.
>>>>>>
>>>>>> [image: Screen Shot 2020-09-03 at 12.20.25 PM.png]
>>>>>>
>>>>>> On Thu, Sep 3, 2020 at 12:16 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> There's a "Re-run Jobs" button at the top right when you open up one
>>>>>>> of the jobs:
>>>>>>>
>>>>>>> [image: image.png]
>>>>>>>
>>>>>>> On Thu, Sep 3, 2020 at 12:02 PM Heejong Lee <he...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette <bh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The new GitHub Actions workflows that run Java and Python tests
>>>>>>>>> against different targets (macos, ubuntu, windows) are great! But just like
>>>>>>>>> our Jenkins infra they flake occasionally. Should we be re-running all of
>>>>>>>>> these jobs until we get green runs?
>>>>>>>>>
>>>>>>>>> Unfortunately it's not possible to re-run an individual job in a
>>>>>>>>> workflow [1], the only option is to re-run all jobs, so flaky tests become
>>>>>>>>> even more problematic.
>>>>>>>>>
>>>>>>>>> I see two options:
>>>>>>>>> 1) Consider it "good enough" if just Jenkins CI passes and any
>>>>>>>>> GitHub actions failures appear to be flakes.
>>>>>>>>> 2) Require that all Jenkins and GitHub checks pass.
>>>>>>>>>
>>>>>>>>> My vote is for (2). (1) risks merging legitimate breakages, and
>>>>>>>>> one could argue that making flaky tests extra painful is a good thing. Also
>>>>>>>>> we can always make an exception if an obvious flake is blocking a critical
>>>>>>>>> PR.
>>>>>>>>>
>>>>>>>>
>>>>>>>> +1 for (2) given that it might be not so easy to figure out whether
>>>>>>>> the failure is flaky (or how critical it is).
>>>>>>>> BTW, I see it's impossible to re-run a specific test but how do we
>>>>>>>> re-run all tests then? Is there a menu item for it or needs to force update
>>>>>>>> the commits?
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also FYI - at first I thought these workflows only had the stdout
>>>>>>>>> available, but the test report directory is also zipped and uploaded as an
>>>>>>>>> artifact. When a failure occurs you can download it to get the full output:
>>>>>>>>> [image: image.png]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>>>>>>>>>
>>>>>>>>

Re: Re-running GitHub Actions jobs

Posted by Ahmet Altay <al...@google.com>.

We can ignore (or disable) macos tests for now. We have not been testing
SDKs on macos since we transitioned out of Travis (2017?). It is possible
that there are issues with our tests on macos. For example, we continued to
test on Windows and regularly fixed test issues when running on Windows.
Another possibility is that github actions have its own infra issues on
MacOS. There are some reported and fixed issues [1] as recent as last month.

[1] https://github.com/actions/virtual-environments/issues/736

On Fri, Sep 18, 2020 at 9:24 AM Brian Hulette <bh...@google.com> wrote:

> There are P1 jiras for the frequent flakes for Python on MacOS:
> https://issues.apache.org/jira/browse/BEAM-10768
> https://issues.apache.org/jira/browse/BEAM-10866
>
> On Thu, Sep 17, 2020 at 7:17 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> I'd be interested in figuring out which infra we intend to use for which
>> signals. I think I am a bit out of the loop on this. I'm pretty sure we are
>> redundantly running a lot of stuff.
>>
>
GH actions has one major advantage on the platforms it offers out of the
box (linux, windows, macos). At least for windows, and macos it makes sense
to use GH actions.

We are still trying out GH actions, and if it works for our purposes we
could consider removing the redundancy by moving all our testing to it.


>> Kenn
>>
>> On Thu, Sep 17, 2020 at 4:50 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> I rarely have a test where these don't flake out, even when I re-run
>>> them multiple times. (The failures look completely irrelevant as well.) We
>>> should probably file JIRAs to get them fixed (ideally) or disable them.
>>>
>>
+1 to filing JIRAs. (Please cc: +Tyson Hamilton <ty...@google.com> on
github action issues, and +Valentyn Tymofieiev <va...@google.com> on
python test flakes. They can help.)


>
>>> On Thu, Sep 17, 2020 at 3:36 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> Hi everyone,
>>>> Just wanted to solicit more opinions on this as these tests are still
>>>> pretty flaky. I think we should be giving flakes in these workflows more
>>>> attention, since they could represent legitimate bugs on platforms we don't
>>>> test thoroughly.
>>>>
>>>> Brian
>>>>
>>>> On Thu, Sep 3, 2020 at 12:28 PM Heejong Lee <he...@google.com> wrote:
>>>>
>>>>> I couldn't see that menu. Probably it needs a certain permission.
>>>>>
>>>>> If ordinary contributors could not re-run the tests by themselves,
>>>>> option (2) might slow down the merging process since someone with the
>>>>> permission should manually retrigger failed flaky tests.
>>>>>
>>>>> [image: Screen Shot 2020-09-03 at 12.20.25 PM.png]
>>>>>
>>>>> On Thu, Sep 3, 2020 at 12:16 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> There's a "Re-run Jobs" button at the top right when you open up one
>>>>>> of the jobs:
>>>>>>
>>>>>> [image: image.png]
>>>>>>
>>>>>> On Thu, Sep 3, 2020 at 12:02 PM Heejong Lee <he...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The new GitHub Actions workflows that run Java and Python tests
>>>>>>>> against different targets (macos, ubuntu, windows) are great! But just like
>>>>>>>> our Jenkins infra they flake occasionally. Should we be re-running all of
>>>>>>>> these jobs until we get green runs?
>>>>>>>>
>>>>>>>> Unfortunately it's not possible to re-run an individual job in a
>>>>>>>> workflow [1], the only option is to re-run all jobs, so flaky tests become
>>>>>>>> even more problematic.
>>>>>>>>
>>>>>>>> I see two options:
>>>>>>>> 1) Consider it "good enough" if just Jenkins CI passes and any
>>>>>>>> GitHub actions failures appear to be flakes.
>>>>>>>> 2) Require that all Jenkins and GitHub checks pass.
>>>>>>>>
>>>>>>>> My vote is for (2). (1) risks merging legitimate breakages, and one
>>>>>>>> could argue that making flaky tests extra painful is a good thing. Also we
>>>>>>>> can always make an exception if an obvious flake is blocking a critical PR.
>>>>>>>>
>>>>>>>
>>>>>>> +1 for (2) given that it might be not so easy to figure out whether
>>>>>>> the failure is flaky (or how critical it is).
>>>>>>> BTW, I see it's impossible to re-run a specific test but how do we
>>>>>>> re-run all tests then? Is there a menu item for it or needs to force update
>>>>>>> the commits?
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Also FYI - at first I thought these workflows only had the stdout
>>>>>>>> available, but the test report directory is also zipped and uploaded as an
>>>>>>>> artifact. When a failure occurs you can download it to get the full output:
>>>>>>>> [image: image.png]
>>>>>>>>
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>>>>>>>>
>>>>>>>

Re: Re-running GitHub Actions jobs

Posted by Brian Hulette <bh...@google.com>.

There are P1 jiras for the frequent flakes for Python on MacOS:
https://issues.apache.org/jira/browse/BEAM-10768
https://issues.apache.org/jira/browse/BEAM-10866

On Thu, Sep 17, 2020 at 7:17 PM Kenneth Knowles <ke...@apache.org> wrote:

> I'd be interested in figuring out which infra we intend to use for which
> signals. I think I am a bit out of the loop on this. I'm pretty sure we are
> redundantly running a lot of stuff.
>
> Kenn
>
> On Thu, Sep 17, 2020 at 4:50 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> I rarely have a test where these don't flake out, even when I re-run them
>> multiple times. (The failures look completely irrelevant as well.) We
>> should probably file JIRAs to get them fixed (ideally) or disable them.
>>
>> On Thu, Sep 17, 2020 at 3:36 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> Hi everyone,
>>> Just wanted to solicit more opinions on this as these tests are still
>>> pretty flaky. I think we should be giving flakes in these workflows more
>>> attention, since they could represent legitimate bugs on platforms we don't
>>> test thoroughly.
>>>
>>> Brian
>>>
>>> On Thu, Sep 3, 2020 at 12:28 PM Heejong Lee <he...@google.com> wrote:
>>>
>>>> I couldn't see that menu. Probably it needs a certain permission.
>>>>
>>>> If ordinary contributors could not re-run the tests by themselves,
>>>> option (2) might slow down the merging process since someone with the
>>>> permission should manually retrigger failed flaky tests.
>>>>
>>>> [image: Screen Shot 2020-09-03 at 12.20.25 PM.png]
>>>>
>>>> On Thu, Sep 3, 2020 at 12:16 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> There's a "Re-run Jobs" button at the top right when you open up one
>>>>> of the jobs:
>>>>>
>>>>> [image: image.png]
>>>>>
>>>>> On Thu, Sep 3, 2020 at 12:02 PM Heejong Lee <he...@google.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The new GitHub Actions workflows that run Java and Python tests
>>>>>>> against different targets (macos, ubuntu, windows) are great! But just like
>>>>>>> our Jenkins infra they flake occasionally. Should we be re-running all of
>>>>>>> these jobs until we get green runs?
>>>>>>>
>>>>>>> Unfortunately it's not possible to re-run an individual job in a
>>>>>>> workflow [1], the only option is to re-run all jobs, so flaky tests become
>>>>>>> even more problematic.
>>>>>>>
>>>>>>> I see two options:
>>>>>>> 1) Consider it "good enough" if just Jenkins CI passes and any
>>>>>>> GitHub actions failures appear to be flakes.
>>>>>>> 2) Require that all Jenkins and GitHub checks pass.
>>>>>>>
>>>>>>> My vote is for (2). (1) risks merging legitimate breakages, and one
>>>>>>> could argue that making flaky tests extra painful is a good thing. Also we
>>>>>>> can always make an exception if an obvious flake is blocking a critical PR.
>>>>>>>
>>>>>>
>>>>>> +1 for (2) given that it might be not so easy to figure out whether
>>>>>> the failure is flaky (or how critical it is).
>>>>>> BTW, I see it's impossible to re-run a specific test but how do we
>>>>>> re-run all tests then? Is there a menu item for it or needs to force update
>>>>>> the commits?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Also FYI - at first I thought these workflows only had the stdout
>>>>>>> available, but the test report directory is also zipped and uploaded as an
>>>>>>> artifact. When a failure occurs you can download it to get the full output:
>>>>>>> [image: image.png]
>>>>>>>
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> [1]
>>>>>>> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>>>>>>>
>>>>>>

Re: Re-running GitHub Actions jobs

Posted by Kenneth Knowles <ke...@apache.org>.

I'd be interested in figuring out which infra we intend to use for which
signals. I think I am a bit out of the loop on this. I'm pretty sure we are
redundantly running a lot of stuff.

Kenn

On Thu, Sep 17, 2020 at 4:50 PM Robert Bradshaw <ro...@google.com> wrote:

> I rarely have a test where these don't flake out, even when I re-run them
> multiple times. (The failures look completely irrelevant as well.) We
> should probably file JIRAs to get them fixed (ideally) or disable them.
>
> On Thu, Sep 17, 2020 at 3:36 PM Brian Hulette <bh...@google.com> wrote:
>
>> Hi everyone,
>> Just wanted to solicit more opinions on this as these tests are still
>> pretty flaky. I think we should be giving flakes in these workflows more
>> attention, since they could represent legitimate bugs on platforms we don't
>> test thoroughly.
>>
>> Brian
>>
>> On Thu, Sep 3, 2020 at 12:28 PM Heejong Lee <he...@google.com> wrote:
>>
>>> I couldn't see that menu. Probably it needs a certain permission.
>>>
>>> If ordinary contributors could not re-run the tests by themselves,
>>> option (2) might slow down the merging process since someone with the
>>> permission should manually retrigger failed flaky tests.
>>>
>>> [image: Screen Shot 2020-09-03 at 12.20.25 PM.png]
>>>
>>> On Thu, Sep 3, 2020 at 12:16 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> There's a "Re-run Jobs" button at the top right when you open up one of
>>>> the jobs:
>>>>
>>>> [image: image.png]
>>>>
>>>> On Thu, Sep 3, 2020 at 12:02 PM Heejong Lee <he...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> The new GitHub Actions workflows that run Java and Python tests
>>>>>> against different targets (macos, ubuntu, windows) are great! But just like
>>>>>> our Jenkins infra they flake occasionally. Should we be re-running all of
>>>>>> these jobs until we get green runs?
>>>>>>
>>>>>> Unfortunately it's not possible to re-run an individual job in a
>>>>>> workflow [1], the only option is to re-run all jobs, so flaky tests become
>>>>>> even more problematic.
>>>>>>
>>>>>> I see two options:
>>>>>> 1) Consider it "good enough" if just Jenkins CI passes and any GitHub
>>>>>> actions failures appear to be flakes.
>>>>>> 2) Require that all Jenkins and GitHub checks pass.
>>>>>>
>>>>>> My vote is for (2). (1) risks merging legitimate breakages, and one
>>>>>> could argue that making flaky tests extra painful is a good thing. Also we
>>>>>> can always make an exception if an obvious flake is blocking a critical PR.
>>>>>>
>>>>>
>>>>> +1 for (2) given that it might be not so easy to figure out whether
>>>>> the failure is flaky (or how critical it is).
>>>>> BTW, I see it's impossible to re-run a specific test but how do we
>>>>> re-run all tests then? Is there a menu item for it or needs to force update
>>>>> the commits?
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> Also FYI - at first I thought these workflows only had the stdout
>>>>>> available, but the test report directory is also zipped and uploaded as an
>>>>>> artifact. When a failure occurs you can download it to get the full output:
>>>>>> [image: image.png]
>>>>>>
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> [1]
>>>>>> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>>>>>>
>>>>>

Re: Re-running GitHub Actions jobs

Posted by Robert Bradshaw <ro...@google.com>.

I rarely have a test where these don't flake out, even when I re-run them
multiple times. (The failures look completely irrelevant as well.) We
should probably file JIRAs to get them fixed (ideally) or disable them.

On Thu, Sep 17, 2020 at 3:36 PM Brian Hulette <bh...@google.com> wrote:

> Hi everyone,
> Just wanted to solicit more opinions on this as these tests are still
> pretty flaky. I think we should be giving flakes in these workflows more
> attention, since they could represent legitimate bugs on platforms we don't
> test thoroughly.
>
> Brian
>
> On Thu, Sep 3, 2020 at 12:28 PM Heejong Lee <he...@google.com> wrote:
>
>> I couldn't see that menu. Probably it needs a certain permission.
>>
>> If ordinary contributors could not re-run the tests by themselves,
>> option (2) might slow down the merging process since someone with the
>> permission should manually retrigger failed flaky tests.
>>
>> [image: Screen Shot 2020-09-03 at 12.20.25 PM.png]
>>
>> On Thu, Sep 3, 2020 at 12:16 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> There's a "Re-run Jobs" button at the top right when you open up one of
>>> the jobs:
>>>
>>> [image: image.png]
>>>
>>> On Thu, Sep 3, 2020 at 12:02 PM Heejong Lee <he...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> The new GitHub Actions workflows that run Java and Python tests
>>>>> against different targets (macos, ubuntu, windows) are great! But just like
>>>>> our Jenkins infra they flake occasionally. Should we be re-running all of
>>>>> these jobs until we get green runs?
>>>>>
>>>>> Unfortunately it's not possible to re-run an individual job in a
>>>>> workflow [1], the only option is to re-run all jobs, so flaky tests become
>>>>> even more problematic.
>>>>>
>>>>> I see two options:
>>>>> 1) Consider it "good enough" if just Jenkins CI passes and any GitHub
>>>>> actions failures appear to be flakes.
>>>>> 2) Require that all Jenkins and GitHub checks pass.
>>>>>
>>>>> My vote is for (2). (1) risks merging legitimate breakages, and one
>>>>> could argue that making flaky tests extra painful is a good thing. Also we
>>>>> can always make an exception if an obvious flake is blocking a critical PR.
>>>>>
>>>>
>>>> +1 for (2) given that it might be not so easy to figure out whether the
>>>> failure is flaky (or how critical it is).
>>>> BTW, I see it's impossible to re-run a specific test but how do we
>>>> re-run all tests then? Is there a menu item for it or needs to force update
>>>> the commits?
>>>>
>>>>
>>>>>
>>>>>
>>>>> Also FYI - at first I thought these workflows only had the stdout
>>>>> available, but the test report directory is also zipped and uploaded as an
>>>>> artifact. When a failure occurs you can download it to get the full output:
>>>>> [image: image.png]
>>>>>
>>>>>
>>>>> Brian
>>>>>
>>>>> [1]
>>>>> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>>>>>
>>>>

Re: Re-running GitHub Actions jobs

Posted by Brian Hulette <bh...@google.com>.

Hi everyone,
Just wanted to solicit more opinions on this as these tests are still
pretty flaky. I think we should be giving flakes in these workflows more
attention, since they could represent legitimate bugs on platforms we don't
test thoroughly.

Brian

On Thu, Sep 3, 2020 at 12:28 PM Heejong Lee <he...@google.com> wrote:

> I couldn't see that menu. Probably it needs a certain permission.
>
> If ordinary contributors could not re-run the tests by themselves,
> option (2) might slow down the merging process since someone with the
> permission should manually retrigger failed flaky tests.
>
> [image: Screen Shot 2020-09-03 at 12.20.25 PM.png]
>
> On Thu, Sep 3, 2020 at 12:16 PM Brian Hulette <bh...@google.com> wrote:
>
>> There's a "Re-run Jobs" button at the top right when you open up one of
>> the jobs:
>>
>> [image: image.png]
>>
>> On Thu, Sep 3, 2020 at 12:02 PM Heejong Lee <he...@google.com> wrote:
>>
>>>
>>>
>>> On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> The new GitHub Actions workflows that run Java and Python tests against
>>>> different targets (macos, ubuntu, windows) are great! But just like our
>>>> Jenkins infra they flake occasionally. Should we be re-running all of these
>>>> jobs until we get green runs?
>>>>
>>>> Unfortunately it's not possible to re-run an individual job in a
>>>> workflow [1], the only option is to re-run all jobs, so flaky tests become
>>>> even more problematic.
>>>>
>>>> I see two options:
>>>> 1) Consider it "good enough" if just Jenkins CI passes and any GitHub
>>>> actions failures appear to be flakes.
>>>> 2) Require that all Jenkins and GitHub checks pass.
>>>>
>>>> My vote is for (2). (1) risks merging legitimate breakages, and one
>>>> could argue that making flaky tests extra painful is a good thing. Also we
>>>> can always make an exception if an obvious flake is blocking a critical PR.
>>>>
>>>
>>> +1 for (2) given that it might be not so easy to figure out whether the
>>> failure is flaky (or how critical it is).
>>> BTW, I see it's impossible to re-run a specific test but how do we
>>> re-run all tests then? Is there a menu item for it or needs to force update
>>> the commits?
>>>
>>>
>>>>
>>>>
>>>> Also FYI - at first I thought these workflows only had the stdout
>>>> available, but the test report directory is also zipped and uploaded as an
>>>> artifact. When a failure occurs you can download it to get the full output:
>>>> [image: image.png]
>>>>
>>>>
>>>> Brian
>>>>
>>>> [1]
>>>> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>>>>
>>>

Re: Re-running GitHub Actions jobs

Posted by Heejong Lee <he...@google.com>.

I couldn't see that menu. Probably it needs a certain permission.

If ordinary contributors could not re-run the tests by themselves,
option (2) might slow down the merging process since someone with the
permission should manually retrigger failed flaky tests.

[image: Screen Shot 2020-09-03 at 12.20.25 PM.png]

On Thu, Sep 3, 2020 at 12:16 PM Brian Hulette <bh...@google.com> wrote:

> There's a "Re-run Jobs" button at the top right when you open up one of
> the jobs:
>
> [image: image.png]
>
> On Thu, Sep 3, 2020 at 12:02 PM Heejong Lee <he...@google.com> wrote:
>
>>
>>
>> On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> The new GitHub Actions workflows that run Java and Python tests against
>>> different targets (macos, ubuntu, windows) are great! But just like our
>>> Jenkins infra they flake occasionally. Should we be re-running all of these
>>> jobs until we get green runs?
>>>
>>> Unfortunately it's not possible to re-run an individual job in a
>>> workflow [1], the only option is to re-run all jobs, so flaky tests become
>>> even more problematic.
>>>
>>> I see two options:
>>> 1) Consider it "good enough" if just Jenkins CI passes and any GitHub
>>> actions failures appear to be flakes.
>>> 2) Require that all Jenkins and GitHub checks pass.
>>>
>>> My vote is for (2). (1) risks merging legitimate breakages, and one
>>> could argue that making flaky tests extra painful is a good thing. Also we
>>> can always make an exception if an obvious flake is blocking a critical PR.
>>>
>>
>> +1 for (2) given that it might be not so easy to figure out whether the
>> failure is flaky (or how critical it is).
>> BTW, I see it's impossible to re-run a specific test but how do we re-run
>> all tests then? Is there a menu item for it or needs to force update the
>> commits?
>>
>>
>>>
>>>
>>> Also FYI - at first I thought these workflows only had the stdout
>>> available, but the test report directory is also zipped and uploaded as an
>>> artifact. When a failure occurs you can download it to get the full output:
>>> [image: image.png]
>>>
>>>
>>> Brian
>>>
>>> [1]
>>> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>>>
>>

Re: Re-running GitHub Actions jobs

Posted by Brian Hulette <bh...@google.com>.

There's a "Re-run Jobs" button at the top right when you open up one of the
jobs:

[image: image.png]

On Thu, Sep 3, 2020 at 12:02 PM Heejong Lee <he...@google.com> wrote:

>
>
> On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette <bh...@google.com> wrote:
>
>> The new GitHub Actions workflows that run Java and Python tests against
>> different targets (macos, ubuntu, windows) are great! But just like our
>> Jenkins infra they flake occasionally. Should we be re-running all of these
>> jobs until we get green runs?
>>
>> Unfortunately it's not possible to re-run an individual job in a workflow
>> [1], the only option is to re-run all jobs, so flaky tests become even more
>> problematic.
>>
>> I see two options:
>> 1) Consider it "good enough" if just Jenkins CI passes and any GitHub
>> actions failures appear to be flakes.
>> 2) Require that all Jenkins and GitHub checks pass.
>>
>> My vote is for (2). (1) risks merging legitimate breakages, and one could
>> argue that making flaky tests extra painful is a good thing. Also we can
>> always make an exception if an obvious flake is blocking a critical PR.
>>
>
> +1 for (2) given that it might be not so easy to figure out whether the
> failure is flaky (or how critical it is).
> BTW, I see it's impossible to re-run a specific test but how do we re-run
> all tests then? Is there a menu item for it or needs to force update the
> commits?
>
>
>>
>>
>> Also FYI - at first I thought these workflows only had the stdout
>> available, but the test report directory is also zipped and uploaded as an
>> artifact. When a failure occurs you can download it to get the full output:
>> [image: image.png]
>>
>>
>> Brian
>>
>> [1]
>> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>>
>

Re: Re-running GitHub Actions jobs

Posted by Heejong Lee <he...@google.com>.

On Thu, Sep 3, 2020 at 11:05 AM Brian Hulette <bh...@google.com> wrote:

> The new GitHub Actions workflows that run Java and Python tests against
> different targets (macos, ubuntu, windows) are great! But just like our
> Jenkins infra they flake occasionally. Should we be re-running all of these
> jobs until we get green runs?
>
> Unfortunately it's not possible to re-run an individual job in a workflow
> [1], the only option is to re-run all jobs, so flaky tests become even more
> problematic.
>
> I see two options:
> 1) Consider it "good enough" if just Jenkins CI passes and any GitHub
> actions failures appear to be flakes.
> 2) Require that all Jenkins and GitHub checks pass.
>
> My vote is for (2). (1) risks merging legitimate breakages, and one could
> argue that making flaky tests extra painful is a good thing. Also we can
> always make an exception if an obvious flake is blocking a critical PR.
>

+1 for (2) given that it might be not so easy to figure out whether the
failure is flaky (or how critical it is).
BTW, I see it's impossible to re-run a specific test but how do we re-run
all tests then? Is there a menu item for it or needs to force update the
commits?


>
>
> Also FYI - at first I thought these workflows only had the stdout
> available, but the test report directory is also zipped and uploaded as an
> artifact. When a failure occurs you can download it to get the full output:
> [image: image.png]
>
>
> Brian
>
> [1]
> https://github.community/t/ability-to-rerun-just-a-single-job-in-a-workflow/17234
>