You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Reuven Lax <re...@google.com> on 2021/08/03 17:43:28 UTC

Flaky tests in Beam

I've noticed recently that our precommit tests are getting flakier and
flakier. Recently I had to run Java PreCommit 5 times before I was able to
get a clean run. This is frustrating for us as developers, but it also is
extremely wasteful of our compute resources.

I started making a list of the flaky tests I've seen. Here are some of the
ones I've dealt with just the past few days; this is not nearly an
exhaustive list - I've seen many others before I started recording them. Of
the below, failures in ElasticsearchIOTest are by far the most common!

We need to try and make these tests not flaky. Barring that, I think the
extremely flaky tests need to be excluded from our presubmit until they can
be fixed. Rerunning the precommit over and over again till green is not a
good testing strategy.


   -

   org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
   false]
   <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
   -



org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>

   -


   org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
   <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
   -

   org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
   <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
   -


   org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
   <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>

Re: Flaky tests in Beam

Posted by Jan Lukavský <je...@seznam.cz>.

> If changes to core are causing Dataflow precommits to fail but not 
> local precommits, that suggests we lack test coverage?
What is the difference between "Dataflow precommit" and "local 
precommit" (besides that the latter can be run without GCP)? If "local 
precommit" should catch _all_ regressions, what would be the reason to 
have any other precommits then? My intuition would be, that precommit 
checks (those being run as part of CI on pull requests) should ideally 
be runnable by virtually anyone locally. Any checks that require a 
specific environment should be run optionally (e.g. like validates 
runner suites).

On 8/16/21 7:11 PM, Andrew Pilloud wrote:
> I can confirm the tests are passing now. Thank you.
>
> If changes to core are causing Dataflow precommits to fail but not 
> local precommits, that suggests we lack test coverage? I'm not 
> suggesting we remove the Dataflow tests entirely, just that we 
> consider removing them from the precommits where there is overlapping 
> test coverage.
>
> I would be +1 in favor of a flag as it would allow us to easily 
> disable Dataflow tests in precommits should we have another outage.
>
> On Mon, Aug 16, 2021 at 9:52 AM Jan Lukavský <je.ik@seznam.cz 
> <ma...@seznam.cz>> wrote:
>
>     The issue is with pull requests. IIRC, I didn't encounter this
>     problem, but I can imagine, that a change in core can make
>     Dataflow precommit to fail. And it would be complicated to fix
>     this without GCP credentials.
>
>     So, to answer the question, I think that no, it would not help, as
>     long as this flag would not be used in CI as well.
>
>     On 8/16/21 6:47 PM, Luke Cwik wrote:
>>     Jan, it would be possible to add a flag that says to skip any IT
>>     tests that require a cloud service of any kind. Would that work
>>     for you?
>>
>>     It turns out that the fix was rolled out and finished about 45
>>     mins ago so my prior e-mail was already out of date when I sent
>>     it. If you had a test that failed on your PR, please feel free to
>>     restart the test using the github trigger phrase associated with it.
>>
>>     I reran one of the suites that were perma-red
>>     https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059
>>     <https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059>
>>     and it passed.
>>
>>
>>     On Mon, Aug 16, 2021 at 9:29 AM Jan Lukavský <je.ik@seznam.cz
>>     <ma...@seznam.cz>> wrote:
>>
>>         Not directly related to the 'flakiness' discussion of this
>>         thread, but I think it would be good if pre-commit checks
>>         could be run locally without GCP credentials.
>>
>>         On 8/16/21 6:24 PM, Luke Cwik wrote:
>>>         The fix was inadvertently run in dry run mode so didn't make
>>>         any changes. Since the fix was taking a couple of hours or
>>>         so and it was getting late on Friday people didn't want to
>>>         start it again till today (after the weekend).
>>>
>>>         I don't think removing the few tests that run an unbounded
>>>         pipeline on Dataflow for a long term is a good idea. Sure,
>>>         we can disable them and re-enable them when there is an
>>>         issue that is blocking folks.
>>>
>>>         On Mon, Aug 16, 2021 at 9:19 AM Andrew Pilloud
>>>         <apilloud@google.com <ma...@google.com>> wrote:
>>>
>>>             The two hours to estimated fix has long passed and we
>>>             are now at 18 days since the last successful run. What
>>>             is the latest estimate?
>>>
>>>             It sounds like these tests are primarily testing
>>>             Dataflow, not Beam. They seem like good candidates to
>>>             remove from the precommit (or limit to Dataflow runner
>>>             changes) even after they are fixed.
>>>
>>>             On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik
>>>             <lcwik@google.com <ma...@google.com>> wrote:
>>>
>>>                 The failure is related due to data that is
>>>                 associated with the apache-beam-testing project
>>>                 which is impacting all the Dataflow streaming tests.
>>>
>>>                 Yes, disabling the tests should have happened weeks
>>>                 ago if:
>>>                 1) The fix seemed like it was going to take a long
>>>                 time (was unknown at the time)
>>>                 2) We had confidence in test coverage minus Dataflow
>>>                 streaming test coverage (which I believe we did)
>>>
>>>
>>>
>>>                 On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud
>>>                 <apilloud@google.com <ma...@google.com>>
>>>                 wrote:
>>>
>>>                     Or if a rollback won't fix this, can we disable
>>>                     the broken tests?
>>>
>>>                     On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud
>>>                     <apilloud@google.com
>>>                     <ma...@google.com>> wrote:
>>>
>>>                         So you can roll back in two hours. Beam has
>>>                         been broken for two weeks. Why isn't a
>>>                         rollback appropriate?
>>>
>>>                         On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik
>>>                         <lcwik@google.com <ma...@google.com>>
>>>                         wrote:
>>>
>>>                             From the test failures that I have seen
>>>                             they have been because of BEAM-12676[1]
>>>                             which is due to a bug impacting Dataflow
>>>                             streaming pipelines for the
>>>                             apache-beam-testing project. The fix is
>>>                             rolling out now from my understanding
>>>                             and should take another 2hrs or so.
>>>                             Rolling back master doesn't seem like
>>>                             what we should be doing at the moment.
>>>
>>>                             1:
>>>                             https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>>>                             <https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676>
>>>
>>>                             On Fri, Aug 13, 2021 at 5:51 PM Andrew
>>>                             Pilloud <apilloud@google.com
>>>                             <ma...@google.com>> wrote:
>>>
>>>                                 Both java and python precommits are
>>>                                 reporting the last successful run
>>>                                 being in July (for both Cron and
>>>                                 Precommit), so it looks like changes
>>>                                 are being submitting without
>>>                                 successful test runs. We
>>>                                 probably shouldn't be doing that?
>>>                                 https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>>>                                 <https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/>
>>>                                 https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>>>                                 <https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/>
>>>                                 https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>>>                                 <https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/>
>>>                                 https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>>>                                 <https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/>
>>>
>>>                                 Is there a plan to get this fixed?
>>>                                 Should we roll master back to July?
>>>
>>>                                 On Tue, Aug 3, 2021 at 12:24 PM
>>>                                 Tyson Hamilton <tysonjh@google.com
>>>                                 <ma...@google.com>> wrote:
>>>
>>>                                     I only realized after sending
>>>                                     that I used the IP for the link,
>>>                                     that was by accident, here is
>>>                                     the proper domain link:
>>>                                     http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>                                     <http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1>
>>>
>>>                                     On Tue, Aug 3, 2021 at 3:22 PM
>>>                                     Tyson Hamilton
>>>                                     <tysonjh@google.com
>>>                                     <ma...@google.com>> wrote:
>>>
>>>                                         The way I've investigated
>>>                                         precommit flake stability is
>>>                                         by looking at the
>>>                                         'Post-commit Test
>>>                                         Reliability' [1] dashboard
>>>                                         (hah!). There is a cron job
>>>                                         that runs precommits and
>>>                                         those results are tracked in
>>>                                         the post commit dashboard
>>>                                         confusingly. This week, Java
>>>                                         is about 50% green for the
>>>                                         pre-commit cron job, not great.
>>>
>>>                                         The plugin we installed for
>>>                                         tracking the most flaky
>>>                                         tests for a job doesn't do
>>>                                         well for the number of tests
>>>                                         present in the precommit
>>>                                         cron job. This could be an
>>>                                         area of improvement to help
>>>                                         add granularity and
>>>                                         visibility to the flakiest
>>>                                         tests over some period of time.
>>>
>>>
>>>                                         [1]:
>>>                                         http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>                                         <http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1>
>>>                                          (look for
>>>                                         "PreCommit_Java_Cron")
>>>
>>>                                         On Tue, Aug 3, 2021 at 2:24
>>>                                         PM Andrew Pilloud
>>>                                         <apilloud@google.com
>>>                                         <ma...@google.com>>
>>>                                         wrote:
>>>
>>>                                             Our metrics show java is
>>>                                             nearly free from flakes,
>>>                                             that go has significant
>>>                                             flakes, and that python
>>>                                             is effectively broken.
>>>                                             It appears they may be
>>>                                             missing coverage on the
>>>                                             Java side. The dashboard
>>>                                             is here:
>>>                                             http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>                                             <http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1>
>>>
>>>
>>>                                             I agree that this is
>>>                                             important to address. I
>>>                                             haven't submitted any
>>>                                             code recently but I
>>>                                             spent a significant
>>>                                             amount of time on the
>>>                                             2.31.0 release
>>>                                             investigating flakes in
>>>                                             the release
>>>                                             validation tests.
>>>
>>>                                             Andrew
>>>
>>>                                             On Tue, Aug 3, 2021 at
>>>                                             10:43 AM Reuven Lax
>>>                                             <relax@google.com
>>>                                             <ma...@google.com>>
>>>                                             wrote:
>>>
>>>                                                 I've noticed
>>>                                                 recently that our
>>>                                                 precommit tests are
>>>                                                 getting flakier and
>>>                                                 flakier. Recently I
>>>                                                 had to run Java
>>>                                                 PreCommit 5 times
>>>                                                 before I was able to
>>>                                                 get a clean run.
>>>                                                 This is frustrating
>>>                                                 for us as
>>>                                                 developers, but it
>>>                                                 also is extremely
>>>                                                 wasteful of our
>>>                                                 compute resources.
>>>
>>>                                                 I started making a
>>>                                                 list of the flaky
>>>                                                 tests I've seen.
>>>                                                 Here are some of the
>>>                                                 ones I've dealt with
>>>                                                 just the past few
>>>                                                 days; this is not
>>>                                                 nearly an exhaustive
>>>                                                 list - I've seen
>>>                                                 many others before I
>>>                                                 started recording
>>>                                                 them. Of the below,
>>>                                                 failures in
>>>                                                 ElasticsearchIOTest
>>>                                                 are by far the most
>>>                                                 common!
>>>
>>>                                                 We need to try and
>>>                                                 make these tests not
>>>                                                 flaky. Barring that,
>>>                                                 I think the
>>>                                                 extremely flaky
>>>                                                 tests need to be
>>>                                                 excluded from our
>>>                                                 presubmit until they
>>>                                                 can be fixed.
>>>                                                 Rerunning the
>>>                                                 precommit over and
>>>                                                 over again till
>>>                                                 green is not a good
>>>                                                 testing strategy.
>>>
>>>                                                  *
>>>
>>>                                                     org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>                                                     false]
>>>                                                     <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>
>>>                                                  *
>>>
>>>                                                 org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>                                                 <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>
>>>                                                  *
>>>
>>>                                                     org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>                                                     <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>
>>>                                                  *
>>>
>>>                                                     org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>                                                     <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>
>>>                                                  *
>>>
>>>                                                     org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>                                                     <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>

Re: Flaky tests in Beam

Posted by Reuven Lax <re...@google.com>.

I don't think this follows. Dataflow (as well as other runners) has
runner-specific overrides, and there are many ways to break those via
changes in core.  We do need some e2e runner tests to actually catch these.

One note: AFAICT we have tests that run against Dataflow in Streaming
Appliance mode as well as Streaming Engine mode, and only the Streaming
Engine mode was broken. We probably could have (temporarily) disabled only
the Streaming Engine tests without losing too much test coverage. Moot
point now as the issue appears to be fixed.

Reuven

On Mon, Aug 16, 2021 at 10:12 AM Andrew Pilloud <ap...@google.com> wrote:

> I can confirm the tests are passing now. Thank you.
>
> If changes to core are causing Dataflow precommits to fail but not local
> precommits, that suggests we lack test coverage? I'm not suggesting we
> remove the Dataflow tests entirely, just that we consider removing them
> from the precommits where there is overlapping test coverage.
>
> I would be +1 in favor of a flag as it would allow us to easily disable
> Dataflow tests in precommits should we have another outage.
>
> On Mon, Aug 16, 2021 at 9:52 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> The issue is with pull requests. IIRC, I didn't encounter this problem,
>> but I can imagine, that a change in core can make Dataflow precommit to
>> fail. And it would be complicated to fix this without GCP credentials.
>>
>> So, to answer the question, I think that no, it would not help, as long
>> as this flag would not be used in CI as well.
>> On 8/16/21 6:47 PM, Luke Cwik wrote:
>>
>> Jan, it would be possible to add a flag that says to skip any IT tests
>> that require a cloud service of any kind. Would that work for you?
>>
>> It turns out that the fix was rolled out and finished about 45 mins ago
>> so my prior e-mail was already out of date when I sent it. If you had a
>> test that failed on your PR, please feel free to restart the test using the
>> github trigger phrase associated with it.
>>
>> I reran one of the suites that were perma-red
>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059
>> and it passed.
>>
>>
>> On Mon, Aug 16, 2021 at 9:29 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Not directly related to the 'flakiness' discussion of this thread, but I
>>> think it would be good if pre-commit checks could be run locally without
>>> GCP credentials.
>>> On 8/16/21 6:24 PM, Luke Cwik wrote:
>>>
>>> The fix was inadvertently run in dry run mode so didn't make any
>>> changes. Since the fix was taking a couple of hours or so and it was
>>> getting late on Friday people didn't want to start it again till today
>>> (after the weekend).
>>>
>>> I don't think removing the few tests that run an unbounded pipeline on
>>> Dataflow for a long term is a good idea. Sure, we can disable them and
>>> re-enable them when there is an issue that is blocking folks.
>>>
>>> On Mon, Aug 16, 2021 at 9:19 AM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> The two hours to estimated fix has long passed and we are now at 18
>>>> days since the last successful run. What is the latest estimate?
>>>>
>>>> It sounds like these tests are primarily testing Dataflow, not Beam.
>>>> They seem like good candidates to remove from the precommit (or limit to
>>>> Dataflow runner changes) even after they are fixed.
>>>>
>>>> On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> The failure is related due to data that is associated with the
>>>>> apache-beam-testing project which is impacting all the Dataflow streaming
>>>>> tests.
>>>>>
>>>>> Yes, disabling the tests should have happened weeks ago if:
>>>>> 1) The fix seemed like it was going to take a long time (was
>>>>> unknown at the time)
>>>>> 2) We had confidence in test coverage minus Dataflow streaming test
>>>>> coverage (which I believe we did)
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Or if a rollback won't fix this, can we disable the broken tests?
>>>>>>
>>>>>> On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud <ap...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> So you can roll back in two hours. Beam has been broken for two
>>>>>>> weeks. Why isn't a rollback appropriate?
>>>>>>>
>>>>>>> On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik <lc...@google.com> wrote:
>>>>>>>
>>>>>>>> From the test failures that I have seen they have been because of
>>>>>>>> BEAM-12676[1] which is due to a bug impacting Dataflow streaming pipelines
>>>>>>>> for the apache-beam-testing project. The fix is rolling out now from my
>>>>>>>> understanding and should take another 2hrs or so. Rolling back master
>>>>>>>> doesn't seem like what we should be doing at the moment.
>>>>>>>>
>>>>>>>> 1: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>>>>>>>>
>>>>>>>> On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud <ap...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Both java and python precommits are reporting the last successful
>>>>>>>>> run being in July (for both Cron and Precommit), so it looks like changes
>>>>>>>>> are being submitting without successful test runs. We probably shouldn't be
>>>>>>>>> doing that?
>>>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>>>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>>>>>>>>>
>>>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>>>>>>>>>
>>>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>>>>>>>>>
>>>>>>>>> Is there a plan to get this fixed? Should we roll master back to
>>>>>>>>> July?
>>>>>>>>>
>>>>>>>>> On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton <ty...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I only realized after sending that I used the IP for the link,
>>>>>>>>>> that was by accident, here is the proper domain link:
>>>>>>>>>> http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The way I've investigated precommit flake stability is by
>>>>>>>>>>> looking at the 'Post-commit Test Reliability' [1] dashboard (hah!). There
>>>>>>>>>>> is a cron job that runs precommits and those results are tracked in the
>>>>>>>>>>> post commit dashboard confusingly. This week, Java is about 50% green for
>>>>>>>>>>> the pre-commit cron job, not great.
>>>>>>>>>>>
>>>>>>>>>>> The plugin we installed for tracking the most flaky tests for a
>>>>>>>>>>> job doesn't do well for the number of tests present in the precommit cron
>>>>>>>>>>> job. This could be an area of improvement to help add granularity and
>>>>>>>>>>> visibility to the flakiest tests over some period of time.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]:
>>>>>>>>>>> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>>>>>>  (look for "PreCommit_Java_Cron")
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <
>>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Our metrics show java is nearly free from flakes, that go has
>>>>>>>>>>>> significant flakes, and that python is effectively broken. It appears they
>>>>>>>>>>>> may be missing coverage on the Java side. The dashboard is here:
>>>>>>>>>>>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>>>>>>>>>>
>>>>>>>>>>>> I agree that this is important to address. I haven't submitted
>>>>>>>>>>>> any code recently but I spent a significant amount of time on the 2.31.0
>>>>>>>>>>>> release investigating flakes in the release validation tests.
>>>>>>>>>>>>
>>>>>>>>>>>> Andrew
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I've noticed recently that our precommit tests are getting
>>>>>>>>>>>>> flakier and flakier. Recently I had to run Java PreCommit 5 times before I
>>>>>>>>>>>>> was able to get a clean run. This is frustrating for us as developers, but
>>>>>>>>>>>>> it also is extremely wasteful of our compute resources.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I started making a list of the flaky tests I've seen. Here are
>>>>>>>>>>>>> some of the ones I've dealt with just the past few days; this is not nearly
>>>>>>>>>>>>> an exhaustive list - I've seen many others before I started recording them.
>>>>>>>>>>>>> Of the below, failures in ElasticsearchIOTest are by far the most common!
>>>>>>>>>>>>>
>>>>>>>>>>>>> We need to try and make these tests not flaky. Barring that, I
>>>>>>>>>>>>> think the extremely flaky tests need to be excluded from our presubmit
>>>>>>>>>>>>> until they can be fixed. Rerunning the precommit over and over again till
>>>>>>>>>>>>> green is not a good testing strategy.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>>>>>>>>>>>    false]
>>>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>>>>>>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: Flaky tests in Beam

Posted by Andrew Pilloud <ap...@google.com>.

I can confirm the tests are passing now. Thank you.

If changes to core are causing Dataflow precommits to fail but not local
precommits, that suggests we lack test coverage? I'm not suggesting we
remove the Dataflow tests entirely, just that we consider removing them
from the precommits where there is overlapping test coverage.

I would be +1 in favor of a flag as it would allow us to easily disable
Dataflow tests in precommits should we have another outage.

On Mon, Aug 16, 2021 at 9:52 AM Jan Lukavský <je...@seznam.cz> wrote:

> The issue is with pull requests. IIRC, I didn't encounter this problem,
> but I can imagine, that a change in core can make Dataflow precommit to
> fail. And it would be complicated to fix this without GCP credentials.
>
> So, to answer the question, I think that no, it would not help, as long as
> this flag would not be used in CI as well.
> On 8/16/21 6:47 PM, Luke Cwik wrote:
>
> Jan, it would be possible to add a flag that says to skip any IT tests
> that require a cloud service of any kind. Would that work for you?
>
> It turns out that the fix was rolled out and finished about 45 mins ago so
> my prior e-mail was already out of date when I sent it. If you had a test
> that failed on your PR, please feel free to restart the test using the
> github trigger phrase associated with it.
>
> I reran one of the suites that were perma-red
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059
> and it passed.
>
>
> On Mon, Aug 16, 2021 at 9:29 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Not directly related to the 'flakiness' discussion of this thread, but I
>> think it would be good if pre-commit checks could be run locally without
>> GCP credentials.
>> On 8/16/21 6:24 PM, Luke Cwik wrote:
>>
>> The fix was inadvertently run in dry run mode so didn't make any changes.
>> Since the fix was taking a couple of hours or so and it was getting late on
>> Friday people didn't want to start it again till today (after the weekend).
>>
>> I don't think removing the few tests that run an unbounded pipeline on
>> Dataflow for a long term is a good idea. Sure, we can disable them and
>> re-enable them when there is an issue that is blocking folks.
>>
>> On Mon, Aug 16, 2021 at 9:19 AM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> The two hours to estimated fix has long passed and we are now at 18 days
>>> since the last successful run. What is the latest estimate?
>>>
>>> It sounds like these tests are primarily testing Dataflow, not Beam.
>>> They seem like good candidates to remove from the precommit (or limit to
>>> Dataflow runner changes) even after they are fixed.
>>>
>>> On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> The failure is related due to data that is associated with the
>>>> apache-beam-testing project which is impacting all the Dataflow streaming
>>>> tests.
>>>>
>>>> Yes, disabling the tests should have happened weeks ago if:
>>>> 1) The fix seemed like it was going to take a long time (was unknown at
>>>> the time)
>>>> 2) We had confidence in test coverage minus Dataflow streaming test
>>>> coverage (which I believe we did)
>>>>
>>>>
>>>>
>>>> On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> Or if a rollback won't fix this, can we disable the broken tests?
>>>>>
>>>>> On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> So you can roll back in two hours. Beam has been broken for two
>>>>>> weeks. Why isn't a rollback appropriate?
>>>>>>
>>>>>> On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik <lc...@google.com> wrote:
>>>>>>
>>>>>>> From the test failures that I have seen they have been because of
>>>>>>> BEAM-12676[1] which is due to a bug impacting Dataflow streaming pipelines
>>>>>>> for the apache-beam-testing project. The fix is rolling out now from my
>>>>>>> understanding and should take another 2hrs or so. Rolling back master
>>>>>>> doesn't seem like what we should be doing at the moment.
>>>>>>>
>>>>>>> 1: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>>>>>>>
>>>>>>> On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud <ap...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Both java and python precommits are reporting the last successful
>>>>>>>> run being in July (for both Cron and Precommit), so it looks like changes
>>>>>>>> are being submitting without successful test runs. We probably shouldn't be
>>>>>>>> doing that?
>>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>>>>>>>>
>>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>>>>>>>>
>>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>>>>>>>>
>>>>>>>> Is there a plan to get this fixed? Should we roll master back to
>>>>>>>> July?
>>>>>>>>
>>>>>>>> On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton <ty...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I only realized after sending that I used the IP for the link,
>>>>>>>>> that was by accident, here is the proper domain link:
>>>>>>>>> http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>>>>
>>>>>>>>> On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The way I've investigated precommit flake stability is by looking
>>>>>>>>>> at the 'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron
>>>>>>>>>> job that runs precommits and those results are tracked in the post commit
>>>>>>>>>> dashboard confusingly. This week, Java is about 50% green for the
>>>>>>>>>> pre-commit cron job, not great.
>>>>>>>>>>
>>>>>>>>>> The plugin we installed for tracking the most flaky tests for a
>>>>>>>>>> job doesn't do well for the number of tests present in the precommit cron
>>>>>>>>>> job. This could be an area of improvement to help add granularity and
>>>>>>>>>> visibility to the flakiest tests over some period of time.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]:
>>>>>>>>>> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>>>>>  (look for "PreCommit_Java_Cron")
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <
>>>>>>>>>> apilloud@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Our metrics show java is nearly free from flakes, that go has
>>>>>>>>>>> significant flakes, and that python is effectively broken. It appears they
>>>>>>>>>>> may be missing coverage on the Java side. The dashboard is here:
>>>>>>>>>>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>>>>>>>>>
>>>>>>>>>>> I agree that this is important to address. I haven't submitted
>>>>>>>>>>> any code recently but I spent a significant amount of time on the 2.31.0
>>>>>>>>>>> release investigating flakes in the release validation tests.
>>>>>>>>>>>
>>>>>>>>>>> Andrew
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I've noticed recently that our precommit tests are getting
>>>>>>>>>>>> flakier and flakier. Recently I had to run Java PreCommit 5 times before I
>>>>>>>>>>>> was able to get a clean run. This is frustrating for us as developers, but
>>>>>>>>>>>> it also is extremely wasteful of our compute resources.
>>>>>>>>>>>>
>>>>>>>>>>>> I started making a list of the flaky tests I've seen. Here are
>>>>>>>>>>>> some of the ones I've dealt with just the past few days; this is not nearly
>>>>>>>>>>>> an exhaustive list - I've seen many others before I started recording them.
>>>>>>>>>>>> Of the below, failures in ElasticsearchIOTest are by far the most common!
>>>>>>>>>>>>
>>>>>>>>>>>> We need to try and make these tests not flaky. Barring that, I
>>>>>>>>>>>> think the extremely flaky tests need to be excluded from our presubmit
>>>>>>>>>>>> until they can be fixed. Rerunning the precommit over and over again till
>>>>>>>>>>>> green is not a good testing strategy.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>>>>>>>>>>    false]
>>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>>>>>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>>>>>>>>>>
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>>>>>>>>>>
>>>>>>>>>>>>

Re: Flaky tests in Beam

Posted by Jan Lukavský <je...@seznam.cz>.

The issue is with pull requests. IIRC, I didn't encounter this problem, 
but I can imagine, that a change in core can make Dataflow precommit to 
fail. And it would be complicated to fix this without GCP credentials.

So, to answer the question, I think that no, it would not help, as long 
as this flag would not be used in CI as well.

On 8/16/21 6:47 PM, Luke Cwik wrote:
> Jan, it would be possible to add a flag that says to skip any IT tests 
> that require a cloud service of any kind. Would that work for you?
>
> It turns out that the fix was rolled out and finished about 45 mins 
> ago so my prior e-mail was already out of date when I sent it. If you 
> had a test that failed on your PR, please feel free to restart the 
> test using the github trigger phrase associated with it.
>
> I reran one of the suites that were perma-red 
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059 
> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059> 
> and it passed.
>
>
> On Mon, Aug 16, 2021 at 9:29 AM Jan Lukavský <je.ik@seznam.cz 
> <ma...@seznam.cz>> wrote:
>
>     Not directly related to the 'flakiness' discussion of this thread,
>     but I think it would be good if pre-commit checks could be run
>     locally without GCP credentials.
>
>     On 8/16/21 6:24 PM, Luke Cwik wrote:
>>     The fix was inadvertently run in dry run mode so didn't make any
>>     changes. Since the fix was taking a couple of hours or so and it
>>     was getting late on Friday people didn't want to start it again
>>     till today (after the weekend).
>>
>>     I don't think removing the few tests that run an unbounded
>>     pipeline on Dataflow for a long term is a good idea. Sure, we can
>>     disable them and re-enable them when there is an issue that is
>>     blocking folks.
>>
>>     On Mon, Aug 16, 2021 at 9:19 AM Andrew Pilloud
>>     <apilloud@google.com <ma...@google.com>> wrote:
>>
>>         The two hours to estimated fix has long passed and we are now
>>         at 18 days since the last successful run. What is the latest
>>         estimate?
>>
>>         It sounds like these tests are primarily testing
>>         Dataflow, not Beam. They seem like good candidates to remove
>>         from the precommit (or limit to Dataflow runner changes) even
>>         after they are fixed.
>>
>>         On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik <lcwik@google.com
>>         <ma...@google.com>> wrote:
>>
>>             The failure is related due to data that is associated
>>             with the apache-beam-testing project which is impacting
>>             all the Dataflow streaming tests.
>>
>>             Yes, disabling the tests should have happened weeks ago if:
>>             1) The fix seemed like it was going to take a long time
>>             (was unknown at the time)
>>             2) We had confidence in test coverage minus Dataflow
>>             streaming test coverage (which I believe we did)
>>
>>
>>
>>             On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud
>>             <apilloud@google.com <ma...@google.com>> wrote:
>>
>>                 Or if a rollback won't fix this, can we disable the
>>                 broken tests?
>>
>>                 On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud
>>                 <apilloud@google.com <ma...@google.com>> wrote:
>>
>>                     So you can roll back in two hours. Beam has been
>>                     broken for two weeks. Why isn't a rollback
>>                     appropriate?
>>
>>                     On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik
>>                     <lcwik@google.com <ma...@google.com>> wrote:
>>
>>                         From the test failures that I have seen they
>>                         have been because of BEAM-12676[1] which is
>>                         due to a bug impacting Dataflow streaming
>>                         pipelines for the apache-beam-testing
>>                         project. The fix is rolling out now from my
>>                         understanding and should take another 2hrs or
>>                         so. Rolling back master doesn't seem like
>>                         what we should be doing at the moment.
>>
>>                         1:
>>                         https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>>                         <https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676>
>>
>>                         On Fri, Aug 13, 2021 at 5:51 PM Andrew
>>                         Pilloud <apilloud@google.com
>>                         <ma...@google.com>> wrote:
>>
>>                             Both java and python precommits are
>>                             reporting the last successful run being
>>                             in July (for both Cron and Precommit), so
>>                             it looks like changes are being
>>                             submitting without successful test runs.
>>                             We probably shouldn't be doing that?
>>                             https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>>                             <https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/>
>>                             https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>>                             <https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/>
>>                             https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>>                             <https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/>
>>                             https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>>                             <https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/>
>>
>>                             Is there a plan to get this fixed? Should
>>                             we roll master back to July?
>>
>>                             On Tue, Aug 3, 2021 at 12:24 PM Tyson
>>                             Hamilton <tysonjh@google.com
>>                             <ma...@google.com>> wrote:
>>
>>                                 I only realized after sending that I
>>                                 used the IP for the link, that was by
>>                                 accident, here is the proper domain
>>                                 link:
>>                                 http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>                                 <http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1>
>>
>>                                 On Tue, Aug 3, 2021 at 3:22 PM Tyson
>>                                 Hamilton <tysonjh@google.com
>>                                 <ma...@google.com>> wrote:
>>
>>                                     The way I've investigated
>>                                     precommit flake stability is by
>>                                     looking at the 'Post-commit Test
>>                                     Reliability' [1] dashboard
>>                                     (hah!). There is a cron job that
>>                                     runs precommits and those results
>>                                     are tracked in the post commit
>>                                     dashboard confusingly. This week,
>>                                     Java is about 50% green for the
>>                                     pre-commit cron job, not great.
>>
>>                                     The plugin we installed for
>>                                     tracking the most flaky tests for
>>                                     a job doesn't do well for the
>>                                     number of tests present in the
>>                                     precommit cron job. This could be
>>                                     an area of improvement to help
>>                                     add granularity and visibility to
>>                                     the flakiest tests over some
>>                                     period of time.
>>
>>
>>                                     [1]:
>>                                     http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>                                     <http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1>
>>                                      (look for "PreCommit_Java_Cron")
>>
>>                                     On Tue, Aug 3, 2021 at 2:24 PM
>>                                     Andrew Pilloud
>>                                     <apilloud@google.com
>>                                     <ma...@google.com>> wrote:
>>
>>                                         Our metrics show java is
>>                                         nearly free from flakes, that
>>                                         go has significant flakes,
>>                                         and that python is
>>                                         effectively broken. It
>>                                         appears they may be missing
>>                                         coverage on the Java side.
>>                                         The dashboard is here:
>>                                         http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>                                         <http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1>
>>
>>
>>                                         I agree that this is
>>                                         important to address. I
>>                                         haven't submitted any code
>>                                         recently but I spent a
>>                                         significant amount of time on
>>                                         the 2.31.0 release
>>                                         investigating flakes in the
>>                                         release validation tests.
>>
>>                                         Andrew
>>
>>                                         On Tue, Aug 3, 2021 at 10:43
>>                                         AM Reuven Lax
>>                                         <relax@google.com
>>                                         <ma...@google.com>> wrote:
>>
>>                                             I've noticed recently
>>                                             that our precommit tests
>>                                             are getting flakier and
>>                                             flakier. Recently I had
>>                                             to run Java PreCommit 5
>>                                             times before I was able
>>                                             to get a clean run. This
>>                                             is frustrating for us as
>>                                             developers, but it also
>>                                             is extremely wasteful of
>>                                             our compute resources.
>>
>>                                             I started making a list
>>                                             of the flaky tests I've
>>                                             seen. Here are some of
>>                                             the ones I've dealt with
>>                                             just the past few days;
>>                                             this is not nearly an
>>                                             exhaustive list - I've
>>                                             seen many others before I
>>                                             started recording them.
>>                                             Of the below, failures in
>>                                             ElasticsearchIOTest are
>>                                             by far the most common!
>>
>>                                             We need to try and make
>>                                             these tests not flaky.
>>                                             Barring that, I think the
>>                                             extremely flaky tests
>>                                             need to be excluded from
>>                                             our presubmit until they
>>                                             can be fixed. Rerunning
>>                                             the precommit over and
>>                                             over again till green is
>>                                             not a good testing strategy.
>>
>>                                              *
>>
>>                                                 org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>                                                 false]
>>                                                 <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>
>>                                              *
>>
>>                                             org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>                                             <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>
>>                                              *
>>
>>                                                 org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>                                                 <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>
>>                                              *
>>
>>                                                 org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>                                                 <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>
>>                                              *
>>
>>                                                 org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>                                                 <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>

Re: Flaky tests in Beam

Posted by Luke Cwik <lc...@google.com>.

Jan, it would be possible to add a flag that says to skip any IT tests that
require a cloud service of any kind. Would that work for you?

It turns out that the fix was rolled out and finished about 45 mins ago so
my prior e-mail was already out of date when I sent it. If you had a test
that failed on your PR, please feel free to restart the test using the
github trigger phrase associated with it.

I reran one of the suites that were perma-red
https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059
and it passed.


On Mon, Aug 16, 2021 at 9:29 AM Jan Lukavský <je...@seznam.cz> wrote:

> Not directly related to the 'flakiness' discussion of this thread, but I
> think it would be good if pre-commit checks could be run locally without
> GCP credentials.
> On 8/16/21 6:24 PM, Luke Cwik wrote:
>
> The fix was inadvertently run in dry run mode so didn't make any changes.
> Since the fix was taking a couple of hours or so and it was getting late on
> Friday people didn't want to start it again till today (after the weekend).
>
> I don't think removing the few tests that run an unbounded pipeline on
> Dataflow for a long term is a good idea. Sure, we can disable them and
> re-enable them when there is an issue that is blocking folks.
>
> On Mon, Aug 16, 2021 at 9:19 AM Andrew Pilloud <ap...@google.com>
> wrote:
>
>> The two hours to estimated fix has long passed and we are now at 18 days
>> since the last successful run. What is the latest estimate?
>>
>> It sounds like these tests are primarily testing Dataflow, not Beam. They
>> seem like good candidates to remove from the precommit (or limit to
>> Dataflow runner changes) even after they are fixed.
>>
>> On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik <lc...@google.com> wrote:
>>
>>> The failure is related due to data that is associated with the
>>> apache-beam-testing project which is impacting all the Dataflow streaming
>>> tests.
>>>
>>> Yes, disabling the tests should have happened weeks ago if:
>>> 1) The fix seemed like it was going to take a long time (was unknown at
>>> the time)
>>> 2) We had confidence in test coverage minus Dataflow streaming test
>>> coverage (which I believe we did)
>>>
>>>
>>>
>>> On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> Or if a rollback won't fix this, can we disable the broken tests?
>>>>
>>>> On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> So you can roll back in two hours. Beam has been broken for two weeks.
>>>>> Why isn't a rollback appropriate?
>>>>>
>>>>> On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> From the test failures that I have seen they have been because of
>>>>>> BEAM-12676[1] which is due to a bug impacting Dataflow streaming pipelines
>>>>>> for the apache-beam-testing project. The fix is rolling out now from my
>>>>>> understanding and should take another 2hrs or so. Rolling back master
>>>>>> doesn't seem like what we should be doing at the moment.
>>>>>>
>>>>>> 1: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>>>>>>
>>>>>> On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud <ap...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Both java and python precommits are reporting the last successful
>>>>>>> run being in July (for both Cron and Precommit), so it looks like changes
>>>>>>> are being submitting without successful test runs. We probably shouldn't be
>>>>>>> doing that?
>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>>>>>>>
>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>>>>>>>
>>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>>>>>>>
>>>>>>> Is there a plan to get this fixed? Should we roll master back to
>>>>>>> July?
>>>>>>>
>>>>>>> On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton <ty...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I only realized after sending that I used the IP for the link, that
>>>>>>>> was by accident, here is the proper domain link:
>>>>>>>> http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>>>
>>>>>>>> On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The way I've investigated precommit flake stability is by looking
>>>>>>>>> at the 'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron
>>>>>>>>> job that runs precommits and those results are tracked in the post commit
>>>>>>>>> dashboard confusingly. This week, Java is about 50% green for the
>>>>>>>>> pre-commit cron job, not great.
>>>>>>>>>
>>>>>>>>> The plugin we installed for tracking the most flaky tests for a
>>>>>>>>> job doesn't do well for the number of tests present in the precommit cron
>>>>>>>>> job. This could be an area of improvement to help add granularity and
>>>>>>>>> visibility to the flakiest tests over some period of time.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>>>>  (look for "PreCommit_Java_Cron")
>>>>>>>>>
>>>>>>>>> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <ap...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Our metrics show java is nearly free from flakes, that go has
>>>>>>>>>> significant flakes, and that python is effectively broken. It appears they
>>>>>>>>>> may be missing coverage on the Java side. The dashboard is here:
>>>>>>>>>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>>>>>>>>
>>>>>>>>>> I agree that this is important to address. I haven't submitted
>>>>>>>>>> any code recently but I spent a significant amount of time on the 2.31.0
>>>>>>>>>> release investigating flakes in the release validation tests.
>>>>>>>>>>
>>>>>>>>>> Andrew
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I've noticed recently that our precommit tests are getting
>>>>>>>>>>> flakier and flakier. Recently I had to run Java PreCommit 5 times before I
>>>>>>>>>>> was able to get a clean run. This is frustrating for us as developers, but
>>>>>>>>>>> it also is extremely wasteful of our compute resources.
>>>>>>>>>>>
>>>>>>>>>>> I started making a list of the flaky tests I've seen. Here are
>>>>>>>>>>> some of the ones I've dealt with just the past few days; this is not nearly
>>>>>>>>>>> an exhaustive list - I've seen many others before I started recording them.
>>>>>>>>>>> Of the below, failures in ElasticsearchIOTest are by far the most common!
>>>>>>>>>>>
>>>>>>>>>>> We need to try and make these tests not flaky. Barring that, I
>>>>>>>>>>> think the extremely flaky tests need to be excluded from our presubmit
>>>>>>>>>>> until they can be fixed. Rerunning the precommit over and over again till
>>>>>>>>>>> green is not a good testing strategy.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>>>>>>>>>    false]
>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>>>>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>>>>>>>>>
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>>>>>>>>>
>>>>>>>>>>>

Re: Flaky tests in Beam

Posted by Jan Lukavský <je...@seznam.cz>.

Not directly related to the 'flakiness' discussion of this thread, but I 
think it would be good if pre-commit checks could be run locally without 
GCP credentials.

On 8/16/21 6:24 PM, Luke Cwik wrote:
> The fix was inadvertently run in dry run mode so didn't make any 
> changes. Since the fix was taking a couple of hours or so and it was 
> getting late on Friday people didn't want to start it again till today 
> (after the weekend).
>
> I don't think removing the few tests that run an unbounded pipeline on 
> Dataflow for a long term is a good idea. Sure, we can disable them and 
> re-enable them when there is an issue that is blocking folks.
>
> On Mon, Aug 16, 2021 at 9:19 AM Andrew Pilloud <apilloud@google.com 
> <ma...@google.com>> wrote:
>
>     The two hours to estimated fix has long passed and we are now at
>     18 days since the last successful run. What is the latest estimate?
>
>     It sounds like these tests are primarily testing Dataflow, not
>     Beam. They seem like good candidates to remove from the
>     precommit (or limit to Dataflow runner changes) even after they
>     are fixed.
>
>     On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik <lcwik@google.com
>     <ma...@google.com>> wrote:
>
>         The failure is related due to data that is associated with the
>         apache-beam-testing project which is impacting all the
>         Dataflow streaming tests.
>
>         Yes, disabling the tests should have happened weeks ago if:
>         1) The fix seemed like it was going to take a long time (was
>         unknown at the time)
>         2) We had confidence in test coverage minus Dataflow streaming
>         test coverage (which I believe we did)
>
>
>
>         On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud
>         <apilloud@google.com <ma...@google.com>> wrote:
>
>             Or if a rollback won't fix this, can we disable the broken
>             tests?
>
>             On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud
>             <apilloud@google.com <ma...@google.com>> wrote:
>
>                 So you can roll back in two hours. Beam has been
>                 broken for two weeks. Why isn't a rollback appropriate?
>
>                 On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik
>                 <lcwik@google.com <ma...@google.com>> wrote:
>
>                     From the test failures that I have seen they have
>                     been because of BEAM-12676[1] which is due to a
>                     bug impacting Dataflow streaming pipelines for the
>                     apache-beam-testing project. The fix is rolling
>                     out now from my understanding and should take
>                     another 2hrs or so. Rolling back master doesn't
>                     seem like what we should be doing at the moment.
>
>                     1:
>                     https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>                     <https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676>
>
>                     On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud
>                     <apilloud@google.com <ma...@google.com>>
>                     wrote:
>
>                         Both java and python precommits are reporting
>                         the last successful run being in July (for
>                         both Cron and Precommit), so it looks like
>                         changes are being submitting without
>                         successful test runs. We probably shouldn't be
>                         doing that?
>                         https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>                         <https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/>
>                         https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>                         <https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/>
>                         https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>                         <https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/>
>                         https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>                         <https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/>
>
>                         Is there a plan to get this fixed? Should we
>                         roll master back to July?
>
>                         On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton
>                         <tysonjh@google.com
>                         <ma...@google.com>> wrote:
>
>                             I only realized after sending that I used
>                             the IP for the link, that was by accident,
>                             here is the proper domain link:
>                             http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>                             <http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1>
>
>                             On Tue, Aug 3, 2021 at 3:22 PM Tyson
>                             Hamilton <tysonjh@google.com
>                             <ma...@google.com>> wrote:
>
>                                 The way I've investigated precommit
>                                 flake stability is by looking at the
>                                 'Post-commit Test Reliability' [1]
>                                 dashboard (hah!). There is a cron job
>                                 that runs precommits and those results
>                                 are tracked in the post commit
>                                 dashboard confusingly. This week, Java
>                                 is about 50% green for the pre-commit
>                                 cron job, not great.
>
>                                 The plugin we installed for tracking
>                                 the most flaky tests for a job doesn't
>                                 do well for the number of tests
>                                 present in the precommit cron job.
>                                 This could be an area of improvement
>                                 to help add granularity and visibility
>                                 to the flakiest tests over some period
>                                 of time.
>
>
>                                 [1]:
>                                 http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>                                 <http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1>
>                                  (look for "PreCommit_Java_Cron")
>
>                                 On Tue, Aug 3, 2021 at 2:24 PM Andrew
>                                 Pilloud <apilloud@google.com
>                                 <ma...@google.com>> wrote:
>
>                                     Our metrics show java is nearly
>                                     free from flakes, that go has
>                                     significant flakes, and that
>                                     python is effectively broken. It
>                                     appears they may be missing
>                                     coverage on the Java side. The
>                                     dashboard is here:
>                                     http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>                                     <http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1>
>
>
>                                     I agree that this is important to
>                                     address. I haven't submitted any
>                                     code recently but I spent a
>                                     significant amount of time on the
>                                     2.31.0 release investigating
>                                     flakes in the release
>                                     validation tests.
>
>                                     Andrew
>
>                                     On Tue, Aug 3, 2021 at 10:43 AM
>                                     Reuven Lax <relax@google.com
>                                     <ma...@google.com>> wrote:
>
>                                         I've noticed recently that our
>                                         precommit tests are getting
>                                         flakier and flakier. Recently
>                                         I had to run Java PreCommit 5
>                                         times before I was able to get
>                                         a clean run. This is
>                                         frustrating for us as
>                                         developers, but it also is
>                                         extremely wasteful of our
>                                         compute resources.
>
>                                         I started making a list of the
>                                         flaky tests I've seen. Here
>                                         are some of the ones I've
>                                         dealt with just the past few
>                                         days; this is not nearly an
>                                         exhaustive list - I've seen
>                                         many others before I started
>                                         recording them. Of the below,
>                                         failures in
>                                         ElasticsearchIOTest are by far
>                                         the most common!
>
>                                         We need to try and make these
>                                         tests not flaky. Barring that,
>                                         I think the extremely flaky
>                                         tests need to be excluded from
>                                         our presubmit until they can
>                                         be fixed. Rerunning the
>                                         precommit over and over again
>                                         till green is not a good
>                                         testing strategy.
>
>                                          *
>
>                                             org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>                                             false]
>                                             <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>
>                                          *
>
>                                         org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>                                         <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>
>                                          *
>
>                                             org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>                                             <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>
>                                          *
>
>                                             org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>                                             <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>
>                                          *
>
>                                             org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>                                             <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>

Re: Flaky tests in Beam

Posted by Luke Cwik <lc...@google.com>.

The fix was inadvertently run in dry run mode so didn't make any changes.
Since the fix was taking a couple of hours or so and it was getting late on
Friday people didn't want to start it again till today (after the weekend).

I don't think removing the few tests that run an unbounded pipeline on
Dataflow for a long term is a good idea. Sure, we can disable them and
re-enable them when there is an issue that is blocking folks.

On Mon, Aug 16, 2021 at 9:19 AM Andrew Pilloud <ap...@google.com> wrote:

> The two hours to estimated fix has long passed and we are now at 18 days
> since the last successful run. What is the latest estimate?
>
> It sounds like these tests are primarily testing Dataflow, not Beam. They
> seem like good candidates to remove from the precommit (or limit to
> Dataflow runner changes) even after they are fixed.
>
> On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik <lc...@google.com> wrote:
>
>> The failure is related due to data that is associated with the
>> apache-beam-testing project which is impacting all the Dataflow streaming
>> tests.
>>
>> Yes, disabling the tests should have happened weeks ago if:
>> 1) The fix seemed like it was going to take a long time (was unknown at
>> the time)
>> 2) We had confidence in test coverage minus Dataflow streaming test
>> coverage (which I believe we did)
>>
>>
>>
>> On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> Or if a rollback won't fix this, can we disable the broken tests?
>>>
>>> On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> So you can roll back in two hours. Beam has been broken for two weeks.
>>>> Why isn't a rollback appropriate?
>>>>
>>>> On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>>> From the test failures that I have seen they have been because of
>>>>> BEAM-12676[1] which is due to a bug impacting Dataflow streaming pipelines
>>>>> for the apache-beam-testing project. The fix is rolling out now from my
>>>>> understanding and should take another 2hrs or so. Rolling back master
>>>>> doesn't seem like what we should be doing at the moment.
>>>>>
>>>>> 1: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>>>>>
>>>>> On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Both java and python precommits are reporting the last successful run
>>>>>> being in July (for both Cron and Precommit), so it looks like changes are
>>>>>> being submitting without successful test runs. We probably shouldn't be
>>>>>> doing that?
>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>>>>>>
>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>>>>>>
>>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>>>>>>
>>>>>> Is there a plan to get this fixed? Should we roll master back to July?
>>>>>>
>>>>>> On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton <ty...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I only realized after sending that I used the IP for the link, that
>>>>>>> was by accident, here is the proper domain link:
>>>>>>> http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>>
>>>>>>> On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The way I've investigated precommit flake stability is by looking
>>>>>>>> at the 'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron
>>>>>>>> job that runs precommits and those results are tracked in the post commit
>>>>>>>> dashboard confusingly. This week, Java is about 50% green for the
>>>>>>>> pre-commit cron job, not great.
>>>>>>>>
>>>>>>>> The plugin we installed for tracking the most flaky tests for a job
>>>>>>>> doesn't do well for the number of tests present in the precommit cron job.
>>>>>>>> This could be an area of improvement to help add granularity and visibility
>>>>>>>> to the flakiest tests over some period of time.
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]:
>>>>>>>> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>>>  (look for "PreCommit_Java_Cron")
>>>>>>>>
>>>>>>>> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <ap...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Our metrics show java is nearly free from flakes, that go has
>>>>>>>>> significant flakes, and that python is effectively broken. It appears they
>>>>>>>>> may be missing coverage on the Java side. The dashboard is here:
>>>>>>>>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>>>>>>>
>>>>>>>>> I agree that this is important to address. I haven't submitted any
>>>>>>>>> code recently but I spent a significant amount of time on the 2.31.0
>>>>>>>>> release investigating flakes in the release validation tests.
>>>>>>>>>
>>>>>>>>> Andrew
>>>>>>>>>
>>>>>>>>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I've noticed recently that our precommit tests are getting
>>>>>>>>>> flakier and flakier. Recently I had to run Java PreCommit 5 times before I
>>>>>>>>>> was able to get a clean run. This is frustrating for us as developers, but
>>>>>>>>>> it also is extremely wasteful of our compute resources.
>>>>>>>>>>
>>>>>>>>>> I started making a list of the flaky tests I've seen. Here are
>>>>>>>>>> some of the ones I've dealt with just the past few days; this is not nearly
>>>>>>>>>> an exhaustive list - I've seen many others before I started recording them.
>>>>>>>>>> Of the below, failures in ElasticsearchIOTest are by far the most common!
>>>>>>>>>>
>>>>>>>>>> We need to try and make these tests not flaky. Barring that, I
>>>>>>>>>> think the extremely flaky tests need to be excluded from our presubmit
>>>>>>>>>> until they can be fixed. Rerunning the precommit over and over again till
>>>>>>>>>> green is not a good testing strategy.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>>>>>>>>    false]
>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>>>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>>>>>>>>
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>>>>>>>>
>>>>>>>>>>

Re: Flaky tests in Beam

Posted by Andrew Pilloud <ap...@google.com>.

The two hours to estimated fix has long passed and we are now at 18 days
since the last successful run. What is the latest estimate?

It sounds like these tests are primarily testing Dataflow, not Beam. They
seem like good candidates to remove from the precommit (or limit to
Dataflow runner changes) even after they are fixed.

On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik <lc...@google.com> wrote:

> The failure is related due to data that is associated with the
> apache-beam-testing project which is impacting all the Dataflow streaming
> tests.
>
> Yes, disabling the tests should have happened weeks ago if:
> 1) The fix seemed like it was going to take a long time (was unknown at
> the time)
> 2) We had confidence in test coverage minus Dataflow streaming test
> coverage (which I believe we did)
>
>
>
> On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud <ap...@google.com>
> wrote:
>
>> Or if a rollback won't fix this, can we disable the broken tests?
>>
>> On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> So you can roll back in two hours. Beam has been broken for two weeks.
>>> Why isn't a rollback appropriate?
>>>
>>> On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> From the test failures that I have seen they have been because of
>>>> BEAM-12676[1] which is due to a bug impacting Dataflow streaming pipelines
>>>> for the apache-beam-testing project. The fix is rolling out now from my
>>>> understanding and should take another 2hrs or so. Rolling back master
>>>> doesn't seem like what we should be doing at the moment.
>>>>
>>>> 1: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>>>>
>>>> On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> Both java and python precommits are reporting the last successful run
>>>>> being in July (for both Cron and Precommit), so it looks like changes are
>>>>> being submitting without successful test runs. We probably shouldn't be
>>>>> doing that?
>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>>>>>
>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>>>>>
>>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>>>>>
>>>>> Is there a plan to get this fixed? Should we roll master back to July?
>>>>>
>>>>> On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I only realized after sending that I used the IP for the link, that
>>>>>> was by accident, here is the proper domain link:
>>>>>> http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>
>>>>>> On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The way I've investigated precommit flake stability is by looking at
>>>>>>> the 'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron
>>>>>>> job that runs precommits and those results are tracked in the post commit
>>>>>>> dashboard confusingly. This week, Java is about 50% green for the
>>>>>>> pre-commit cron job, not great.
>>>>>>>
>>>>>>> The plugin we installed for tracking the most flaky tests for a job
>>>>>>> doesn't do well for the number of tests present in the precommit cron job.
>>>>>>> This could be an area of improvement to help add granularity and visibility
>>>>>>> to the flakiest tests over some period of time.
>>>>>>>
>>>>>>>
>>>>>>> [1]:
>>>>>>> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>>  (look for "PreCommit_Java_Cron")
>>>>>>>
>>>>>>> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <ap...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Our metrics show java is nearly free from flakes, that go has
>>>>>>>> significant flakes, and that python is effectively broken. It appears they
>>>>>>>> may be missing coverage on the Java side. The dashboard is here:
>>>>>>>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>>>>>>
>>>>>>>> I agree that this is important to address. I haven't submitted any
>>>>>>>> code recently but I spent a significant amount of time on the 2.31.0
>>>>>>>> release investigating flakes in the release validation tests.
>>>>>>>>
>>>>>>>> Andrew
>>>>>>>>
>>>>>>>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I've noticed recently that our precommit tests are getting flakier
>>>>>>>>> and flakier. Recently I had to run Java PreCommit 5 times before I was able
>>>>>>>>> to get a clean run. This is frustrating for us as developers, but it also
>>>>>>>>> is extremely wasteful of our compute resources.
>>>>>>>>>
>>>>>>>>> I started making a list of the flaky tests I've seen. Here are
>>>>>>>>> some of the ones I've dealt with just the past few days; this is not nearly
>>>>>>>>> an exhaustive list - I've seen many others before I started recording them.
>>>>>>>>> Of the below, failures in ElasticsearchIOTest are by far the most common!
>>>>>>>>>
>>>>>>>>> We need to try and make these tests not flaky. Barring that, I
>>>>>>>>> think the extremely flaky tests need to be excluded from our presubmit
>>>>>>>>> until they can be fixed. Rerunning the precommit over and over again till
>>>>>>>>> green is not a good testing strategy.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    -
>>>>>>>>>
>>>>>>>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>>>>>>>    false]
>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>>>>>>>    -
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>>>>>>>
>>>>>>>>>    -
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>>>>>>>    -
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>>>>>>>    -
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>>>>>>>
>>>>>>>>>

Re: Flaky tests in Beam

Posted by Luke Cwik <lc...@google.com>.

The failure is related due to data that is associated with the
apache-beam-testing project which is impacting all the Dataflow streaming
tests.

Yes, disabling the tests should have happened weeks ago if:
1) The fix seemed like it was going to take a long time (was unknown at the
time)
2) We had confidence in test coverage minus Dataflow streaming test
coverage (which I believe we did)



On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud <ap...@google.com> wrote:

> Or if a rollback won't fix this, can we disable the broken tests?
>
> On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud <ap...@google.com>
> wrote:
>
>> So you can roll back in two hours. Beam has been broken for two weeks.
>> Why isn't a rollback appropriate?
>>
>> On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik <lc...@google.com> wrote:
>>
>>> From the test failures that I have seen they have been because of
>>> BEAM-12676[1] which is due to a bug impacting Dataflow streaming pipelines
>>> for the apache-beam-testing project. The fix is rolling out now from my
>>> understanding and should take another 2hrs or so. Rolling back master
>>> doesn't seem like what we should be doing at the moment.
>>>
>>> 1: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>>>
>>> On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> Both java and python precommits are reporting the last successful run
>>>> being in July (for both Cron and Precommit), so it looks like changes are
>>>> being submitting without successful test runs. We probably shouldn't be
>>>> doing that?
>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>>>>
>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>>>>
>>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>>>>
>>>> Is there a plan to get this fixed? Should we roll master back to July?
>>>>
>>>> On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> I only realized after sending that I used the IP for the link, that
>>>>> was by accident, here is the proper domain link:
>>>>> http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>
>>>>> On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com>
>>>>> wrote:
>>>>>
>>>>>> The way I've investigated precommit flake stability is by looking at
>>>>>> the 'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron
>>>>>> job that runs precommits and those results are tracked in the post commit
>>>>>> dashboard confusingly. This week, Java is about 50% green for the
>>>>>> pre-commit cron job, not great.
>>>>>>
>>>>>> The plugin we installed for tracking the most flaky tests for a job
>>>>>> doesn't do well for the number of tests present in the precommit cron job.
>>>>>> This could be an area of improvement to help add granularity and visibility
>>>>>> to the flakiest tests over some period of time.
>>>>>>
>>>>>>
>>>>>> [1]:
>>>>>> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>>  (look for "PreCommit_Java_Cron")
>>>>>>
>>>>>> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <ap...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Our metrics show java is nearly free from flakes, that go has
>>>>>>> significant flakes, and that python is effectively broken. It appears they
>>>>>>> may be missing coverage on the Java side. The dashboard is here:
>>>>>>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>>>>>
>>>>>>> I agree that this is important to address. I haven't submitted any
>>>>>>> code recently but I spent a significant amount of time on the 2.31.0
>>>>>>> release investigating flakes in the release validation tests.
>>>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com> wrote:
>>>>>>>
>>>>>>>> I've noticed recently that our precommit tests are getting flakier
>>>>>>>> and flakier. Recently I had to run Java PreCommit 5 times before I was able
>>>>>>>> to get a clean run. This is frustrating for us as developers, but it also
>>>>>>>> is extremely wasteful of our compute resources.
>>>>>>>>
>>>>>>>> I started making a list of the flaky tests I've seen. Here are some
>>>>>>>> of the ones I've dealt with just the past few days; this is not nearly an
>>>>>>>> exhaustive list - I've seen many others before I started recording them. Of
>>>>>>>> the below, failures in ElasticsearchIOTest are by far the most common!
>>>>>>>>
>>>>>>>> We need to try and make these tests not flaky. Barring that, I
>>>>>>>> think the extremely flaky tests need to be excluded from our presubmit
>>>>>>>> until they can be fixed. Rerunning the precommit over and over again till
>>>>>>>> green is not a good testing strategy.
>>>>>>>>
>>>>>>>>
>>>>>>>>    -
>>>>>>>>
>>>>>>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>>>>>>    false]
>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>>>>>>    -
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>>>>>>
>>>>>>>>    -
>>>>>>>>
>>>>>>>>
>>>>>>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>>>>>>    -
>>>>>>>>
>>>>>>>>
>>>>>>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>>>>>>    -
>>>>>>>>
>>>>>>>>
>>>>>>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>>>>>>
>>>>>>>>

Re: Flaky tests in Beam

Posted by Andrew Pilloud <ap...@google.com>.

Or if a rollback won't fix this, can we disable the broken tests?

On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud <ap...@google.com> wrote:

> So you can roll back in two hours. Beam has been broken for two weeks. Why
> isn't a rollback appropriate?
>
> On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik <lc...@google.com> wrote:
>
>> From the test failures that I have seen they have been because of
>> BEAM-12676[1] which is due to a bug impacting Dataflow streaming pipelines
>> for the apache-beam-testing project. The fix is rolling out now from my
>> understanding and should take another 2hrs or so. Rolling back master
>> doesn't seem like what we should be doing at the moment.
>>
>> 1: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>>
>> On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> Both java and python precommits are reporting the last successful run
>>> being in July (for both Cron and Precommit), so it looks like changes are
>>> being submitting without successful test runs. We probably shouldn't be
>>> doing that?
>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>>>
>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>>>
>>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>>>
>>> Is there a plan to get this fixed? Should we roll master back to July?
>>>
>>> On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> I only realized after sending that I used the IP for the link, that was
>>>> by accident, here is the proper domain link:
>>>> http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>
>>>> On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> The way I've investigated precommit flake stability is by looking at
>>>>> the 'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron
>>>>> job that runs precommits and those results are tracked in the post commit
>>>>> dashboard confusingly. This week, Java is about 50% green for the
>>>>> pre-commit cron job, not great.
>>>>>
>>>>> The plugin we installed for tracking the most flaky tests for a job
>>>>> doesn't do well for the number of tests present in the precommit cron job.
>>>>> This could be an area of improvement to help add granularity and visibility
>>>>> to the flakiest tests over some period of time.
>>>>>
>>>>>
>>>>> [1]:
>>>>> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>>  (look for "PreCommit_Java_Cron")
>>>>>
>>>>> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Our metrics show java is nearly free from flakes, that go has
>>>>>> significant flakes, and that python is effectively broken. It appears they
>>>>>> may be missing coverage on the Java side. The dashboard is here:
>>>>>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>>>>
>>>>>> I agree that this is important to address. I haven't submitted any
>>>>>> code recently but I spent a significant amount of time on the 2.31.0
>>>>>> release investigating flakes in the release validation tests.
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> I've noticed recently that our precommit tests are getting flakier
>>>>>>> and flakier. Recently I had to run Java PreCommit 5 times before I was able
>>>>>>> to get a clean run. This is frustrating for us as developers, but it also
>>>>>>> is extremely wasteful of our compute resources.
>>>>>>>
>>>>>>> I started making a list of the flaky tests I've seen. Here are some
>>>>>>> of the ones I've dealt with just the past few days; this is not nearly an
>>>>>>> exhaustive list - I've seen many others before I started recording them. Of
>>>>>>> the below, failures in ElasticsearchIOTest are by far the most common!
>>>>>>>
>>>>>>> We need to try and make these tests not flaky. Barring that, I think
>>>>>>> the extremely flaky tests need to be excluded from our presubmit until they
>>>>>>> can be fixed. Rerunning the precommit over and over again till green is not
>>>>>>> a good testing strategy.
>>>>>>>
>>>>>>>
>>>>>>>    -
>>>>>>>
>>>>>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>>>>>    false]
>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>>>>>    -
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>>>>>
>>>>>>>    -
>>>>>>>
>>>>>>>
>>>>>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>>>>>    -
>>>>>>>
>>>>>>>
>>>>>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>>>>>    -
>>>>>>>
>>>>>>>
>>>>>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>>>>>
>>>>>>>

Re: Flaky tests in Beam

Posted by Andrew Pilloud <ap...@google.com>.

So you can roll back in two hours. Beam has been broken for two weeks. Why
isn't a rollback appropriate?

On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik <lc...@google.com> wrote:

> From the test failures that I have seen they have been because of
> BEAM-12676[1] which is due to a bug impacting Dataflow streaming pipelines
> for the apache-beam-testing project. The fix is rolling out now from my
> understanding and should take another 2hrs or so. Rolling back master
> doesn't seem like what we should be doing at the moment.
>
> 1: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
>
> On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud <ap...@google.com>
> wrote:
>
>> Both java and python precommits are reporting the last successful run
>> being in July (for both Cron and Precommit), so it looks like changes are
>> being submitting without successful test runs. We probably shouldn't be
>> doing that?
>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
>> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>>
>> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>>
>> Is there a plan to get this fixed? Should we roll master back to July?
>>
>> On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton <ty...@google.com>
>> wrote:
>>
>>> I only realized after sending that I used the IP for the link, that was
>>> by accident, here is the proper domain link:
>>> http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>
>>> On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> The way I've investigated precommit flake stability is by looking at
>>>> the 'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron
>>>> job that runs precommits and those results are tracked in the post commit
>>>> dashboard confusingly. This week, Java is about 50% green for the
>>>> pre-commit cron job, not great.
>>>>
>>>> The plugin we installed for tracking the most flaky tests for a job
>>>> doesn't do well for the number of tests present in the precommit cron job.
>>>> This could be an area of improvement to help add granularity and visibility
>>>> to the flakiest tests over some period of time.
>>>>
>>>>
>>>> [1]:
>>>> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>>  (look for "PreCommit_Java_Cron")
>>>>
>>>> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>>
>>>>> Our metrics show java is nearly free from flakes, that go has
>>>>> significant flakes, and that python is effectively broken. It appears they
>>>>> may be missing coverage on the Java side. The dashboard is here:
>>>>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>>>
>>>>> I agree that this is important to address. I haven't submitted any
>>>>> code recently but I spent a significant amount of time on the 2.31.0
>>>>> release investigating flakes in the release validation tests.
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> I've noticed recently that our precommit tests are getting flakier
>>>>>> and flakier. Recently I had to run Java PreCommit 5 times before I was able
>>>>>> to get a clean run. This is frustrating for us as developers, but it also
>>>>>> is extremely wasteful of our compute resources.
>>>>>>
>>>>>> I started making a list of the flaky tests I've seen. Here are some
>>>>>> of the ones I've dealt with just the past few days; this is not nearly an
>>>>>> exhaustive list - I've seen many others before I started recording them. Of
>>>>>> the below, failures in ElasticsearchIOTest are by far the most common!
>>>>>>
>>>>>> We need to try and make these tests not flaky. Barring that, I think
>>>>>> the extremely flaky tests need to be excluded from our presubmit until they
>>>>>> can be fixed. Rerunning the precommit over and over again till green is not
>>>>>> a good testing strategy.
>>>>>>
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>>>>    false]
>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>>>>    -
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>
>>>>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>>>>    -
>>>>>>
>>>>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>>>>    -
>>>>>>
>>>>>>
>>>>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>>>>
>>>>>>

Re: Flaky tests in Beam

Posted by Luke Cwik <lc...@google.com>.

From the test failures that I have seen they have been because of
BEAM-12676[1] which is due to a bug impacting Dataflow streaming pipelines
for the apache-beam-testing project. The fix is rolling out now from my
understanding and should take another 2hrs or so. Rolling back master
doesn't seem like what we should be doing at the moment.

1: https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676

On Fri, Aug 13, 2021 at 5:51 PM Andrew Pilloud <ap...@google.com> wrote:

> Both java and python precommits are reporting the last successful run
> being in July (for both Cron and Precommit), so it looks like changes are
> being submitting without successful test runs. We probably shouldn't be
> doing that?
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
> https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
>
> https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
>
> Is there a plan to get this fixed? Should we roll master back to July?
>
> On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton <ty...@google.com> wrote:
>
>> I only realized after sending that I used the IP for the link, that was
>> by accident, here is the proper domain link:
>> http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>
>> On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com> wrote:
>>
>>> The way I've investigated precommit flake stability is by looking at the
>>> 'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron job
>>> that runs precommits and those results are tracked in the post commit
>>> dashboard confusingly. This week, Java is about 50% green for the
>>> pre-commit cron job, not great.
>>>
>>> The plugin we installed for tracking the most flaky tests for a job
>>> doesn't do well for the number of tests present in the precommit cron job.
>>> This could be an area of improvement to help add granularity and visibility
>>> to the flakiest tests over some period of time.
>>>
>>>
>>> [1]:
>>> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>>  (look for "PreCommit_Java_Cron")
>>>
>>> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>>
>>>> Our metrics show java is nearly free from flakes, that go has
>>>> significant flakes, and that python is effectively broken. It appears they
>>>> may be missing coverage on the Java side. The dashboard is here:
>>>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>>
>>>> I agree that this is important to address. I haven't submitted any code
>>>> recently but I spent a significant amount of time on the 2.31.0 release
>>>> investigating flakes in the release validation tests.
>>>>
>>>> Andrew
>>>>
>>>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> I've noticed recently that our precommit tests are getting flakier and
>>>>> flakier. Recently I had to run Java PreCommit 5 times before I was able to
>>>>> get a clean run. This is frustrating for us as developers, but it also is
>>>>> extremely wasteful of our compute resources.
>>>>>
>>>>> I started making a list of the flaky tests I've seen. Here are some of
>>>>> the ones I've dealt with just the past few days; this is not nearly an
>>>>> exhaustive list - I've seen many others before I started recording them. Of
>>>>> the below, failures in ElasticsearchIOTest are by far the most common!
>>>>>
>>>>> We need to try and make these tests not flaky. Barring that, I think
>>>>> the extremely flaky tests need to be excluded from our presubmit until they
>>>>> can be fixed. Rerunning the precommit over and over again till green is not
>>>>> a good testing strategy.
>>>>>
>>>>>
>>>>>    -
>>>>>
>>>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>>>    false]
>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>>>    -
>>>>>
>>>>>
>>>>>
>>>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>>>
>>>>>    -
>>>>>
>>>>>
>>>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>>>    -
>>>>>
>>>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>>>    -
>>>>>
>>>>>
>>>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>>>
>>>>>

Re: Flaky tests in Beam

Posted by Andrew Pilloud <ap...@google.com>.

Both java and python precommits are reporting the last successful run being
in July (for both Cron and Precommit), so it looks like changes are being
submitting without successful test runs. We probably shouldn't be doing
that?
https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/

Is there a plan to get this fixed? Should we roll master back to July?

On Tue, Aug 3, 2021 at 12:24 PM Tyson Hamilton <ty...@google.com> wrote:

> I only realized after sending that I used the IP for the link, that was by
> accident, here is the proper domain link:
> http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>
> On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com> wrote:
>
>> The way I've investigated precommit flake stability is by looking at the
>> 'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron job
>> that runs precommits and those results are tracked in the post commit
>> dashboard confusingly. This week, Java is about 50% green for the
>> pre-commit cron job, not great.
>>
>> The plugin we installed for tracking the most flaky tests for a job
>> doesn't do well for the number of tests present in the precommit cron job.
>> This could be an area of improvement to help add granularity and visibility
>> to the flakiest tests over some period of time.
>>
>>
>> [1]:
>> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>>  (look for "PreCommit_Java_Cron")
>>
>> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <ap...@google.com>
>> wrote:
>>
>>> Our metrics show java is nearly free from flakes, that go has
>>> significant flakes, and that python is effectively broken. It appears they
>>> may be missing coverage on the Java side. The dashboard is here:
>>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>>
>>> I agree that this is important to address. I haven't submitted any code
>>> recently but I spent a significant amount of time on the 2.31.0 release
>>> investigating flakes in the release validation tests.
>>>
>>> Andrew
>>>
>>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> I've noticed recently that our precommit tests are getting flakier and
>>>> flakier. Recently I had to run Java PreCommit 5 times before I was able to
>>>> get a clean run. This is frustrating for us as developers, but it also is
>>>> extremely wasteful of our compute resources.
>>>>
>>>> I started making a list of the flaky tests I've seen. Here are some of
>>>> the ones I've dealt with just the past few days; this is not nearly an
>>>> exhaustive list - I've seen many others before I started recording them. Of
>>>> the below, failures in ElasticsearchIOTest are by far the most common!
>>>>
>>>> We need to try and make these tests not flaky. Barring that, I think
>>>> the extremely flaky tests need to be excluded from our presubmit until they
>>>> can be fixed. Rerunning the precommit over and over again till green is not
>>>> a good testing strategy.
>>>>
>>>>
>>>>    -
>>>>
>>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>>    false]
>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>>    -
>>>>
>>>>
>>>>
>>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>>
>>>>    -
>>>>
>>>>
>>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>>    -
>>>>
>>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>>    -
>>>>
>>>>
>>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>>
>>>>

Re: Flaky tests in Beam

Posted by Tyson Hamilton <ty...@google.com>.

I only realized after sending that I used the IP for the link, that was by
accident, here is the proper domain link:
http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1

On Tue, Aug 3, 2021 at 3:22 PM Tyson Hamilton <ty...@google.com> wrote:

> The way I've investigated precommit flake stability is by looking at the
> 'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron job
> that runs precommits and those results are tracked in the post commit
> dashboard confusingly. This week, Java is about 50% green for the
> pre-commit cron job, not great.
>
> The plugin we installed for tracking the most flaky tests for a job
> doesn't do well for the number of tests present in the precommit cron job.
> This could be an area of improvement to help add granularity and visibility
> to the flakiest tests over some period of time.
>
>
> [1]:
> http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
>  (look for "PreCommit_Java_Cron")
>
> On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <ap...@google.com> wrote:
>
>> Our metrics show java is nearly free from flakes, that go has significant
>> flakes, and that python is effectively broken. It appears they may be
>> missing coverage on the Java side. The dashboard is here:
>> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>>
>> I agree that this is important to address. I haven't submitted any code
>> recently but I spent a significant amount of time on the 2.31.0 release
>> investigating flakes in the release validation tests.
>>
>> Andrew
>>
>> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com> wrote:
>>
>>> I've noticed recently that our precommit tests are getting flakier and
>>> flakier. Recently I had to run Java PreCommit 5 times before I was able to
>>> get a clean run. This is frustrating for us as developers, but it also is
>>> extremely wasteful of our compute resources.
>>>
>>> I started making a list of the flaky tests I've seen. Here are some of
>>> the ones I've dealt with just the past few days; this is not nearly an
>>> exhaustive list - I've seen many others before I started recording them. Of
>>> the below, failures in ElasticsearchIOTest are by far the most common!
>>>
>>> We need to try and make these tests not flaky. Barring that, I think the
>>> extremely flaky tests need to be excluded from our presubmit until they can
>>> be fixed. Rerunning the precommit over and over again till green is not a
>>> good testing strategy.
>>>
>>>
>>>    -
>>>
>>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>>    false]
>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>>    -
>>>
>>>
>>>
>>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>>
>>>    -
>>>
>>>
>>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>>    -
>>>
>>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>>    -
>>>
>>>
>>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>>
>>>

Re: Flaky tests in Beam

Posted by Tyson Hamilton <ty...@google.com>.

The way I've investigated precommit flake stability is by looking at the
'Post-commit Test Reliability' [1] dashboard (hah!). There is a cron job
that runs precommits and those results are tracked in the post commit
dashboard confusingly. This week, Java is about 50% green for the
pre-commit cron job, not great.

The plugin we installed for tracking the most flaky tests for a job doesn't
do well for the number of tests present in the precommit cron job. This
could be an area of improvement to help add granularity and visibility to
the flakiest tests over some period of time.


[1]: http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
 (look for "PreCommit_Java_Cron")

On Tue, Aug 3, 2021 at 2:24 PM Andrew Pilloud <ap...@google.com> wrote:

> Our metrics show java is nearly free from flakes, that go has significant
> flakes, and that python is effectively broken. It appears they may be
> missing coverage on the Java side. The dashboard is here:
> http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
>
> I agree that this is important to address. I haven't submitted any code
> recently but I spent a significant amount of time on the 2.31.0 release
> investigating flakes in the release validation tests.
>
> Andrew
>
> On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com> wrote:
>
>> I've noticed recently that our precommit tests are getting flakier and
>> flakier. Recently I had to run Java PreCommit 5 times before I was able to
>> get a clean run. This is frustrating for us as developers, but it also is
>> extremely wasteful of our compute resources.
>>
>> I started making a list of the flaky tests I've seen. Here are some of
>> the ones I've dealt with just the past few days; this is not nearly an
>> exhaustive list - I've seen many others before I started recording them. Of
>> the below, failures in ElasticsearchIOTest are by far the most common!
>>
>> We need to try and make these tests not flaky. Barring that, I think the
>> extremely flaky tests need to be excluded from our presubmit until they can
>> be fixed. Rerunning the precommit over and over again till green is not a
>> good testing strategy.
>>
>>
>>    -
>>
>>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>>    false]
>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>>    -
>>
>>
>>
>> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
>> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>>
>>    -
>>
>>
>>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>>    -
>>
>>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>>    -
>>
>>
>>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>>
>>

Re: Flaky tests in Beam

Posted by Andrew Pilloud <ap...@google.com>.

Our metrics show java is nearly free from flakes, that go has significant
flakes, and that python is effectively broken. It appears they may be
missing coverage on the Java side. The dashboard is here:
http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1

I agree that this is important to address. I haven't submitted any code
recently but I spent a significant amount of time on the 2.31.0 release
investigating flakes in the release validation tests.

Andrew

On Tue, Aug 3, 2021 at 10:43 AM Reuven Lax <re...@google.com> wrote:

> I've noticed recently that our precommit tests are getting flakier and
> flakier. Recently I had to run Java PreCommit 5 times before I was able to
> get a clean run. This is frustrating for us as developers, but it also is
> extremely wasteful of our compute resources.
>
> I started making a list of the flaky tests I've seen. Here are some of the
> ones I've dealt with just the past few days; this is not nearly an
> exhaustive list - I've seen many others before I started recording them. Of
> the below, failures in ElasticsearchIOTest are by far the most common!
>
> We need to try and make these tests not flaky. Barring that, I think the
> extremely flaky tests need to be excluded from our presubmit until they can
> be fixed. Rerunning the precommit over and over again till green is not a
> good testing strategy.
>
>
>    -
>
>    org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
>    false]
>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>
>    -
>
>
>
> org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
>
>    -
>
>
>    org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>
>    -
>
>    org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>
>    -
>
>
>    org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
>    <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>
>
>