You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Lari Hotari <la...@sagire.fi> on 2021/01/29 18:26:33 UTC

Fixing flaky tests: help needed

Dear Pulsar community members,

In order to improve our CI, we will have to fix the flaky tests. In some
cases it might be necessary to replace an existing test with a redesigned
test.

The draft PIP "Changes to flaky test handling" document
<https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing>
lists
the top 10 flaky tests. A lot of them have already been address by pull
requests in the past week or so.

This is the list of recent PRs that fix flaky tests from the top 10 flaky
tests list:
https://github.com/apache/pulsar/pull/9286
https://github.com/apache/pulsar/pull/9243
https://github.com/apache/pulsar/pull/9258
https://github.com/apache/pulsar/pull/9356

These are the GH issues for the remaining ones in the top 10 flaky tests
list:
https://github.com/apache/pulsar/issues/6368
https://github.com/apache/pulsar/issues/9369
https://github.com/apache/pulsar/issues/9368

If you would like to help to fix flaky tests you can pick one of the open
issues above. Just add a comment on the issue when you start working on it
so that we can coordinate activities.

It is also helpful to report a flaky test when you encounter one. I've been
using this type of template for reporting a flaky test:
https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The
issues #9368 and #9369 have been reported using this template.
Search for the test name before reporting so that we don't end up with
duplicates.

The issues #6368, #9369 and #9368 are the 3 next important issues to fix.
I'm planning to create a more extensive list of the flaky failures so that
we can target the most flaky ones when we continue fixing the flaky tests.
I have some scripts in development to assist in mining the Pulsar Github
Action workflow run logs.

This is a search to find flaky issues in Pulsar GH issues:
https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen

Looking forward to the contributions for fixing flaky tests,

BR,

Lari

Re: Fixing flaky tests: help needed

Posted by Yuva raj <uv...@gmail.com>.
This is great news!

On Fri, 12 Feb 2021 at 13:20, Lari Hotari <la...@sagire.fi> wrote:

> Hi all,
>
> There has been some great progress in fixing the flaky tests. It seems that
> there's more stability in the builds after more fixes have been merged to
> master.
> This work has an impact. Thank you for the contributions.
>
> Our work is not over. There's a lot more to fix. Please continue
> contributing to make Pulsar CI better.
>
> Here's the list of open issues:
>
> https://github.com/apache/pulsar/issues?q=is%3Aissue+is%3Aopen+Flaky-test+sort%3Aupdated-desc
>
> As usual, please comment on the issue to assign it to yourself.
> You can join Pulsar Slack's #testing channel to share tips & tricks around
> fixing the flaky tests or for asking questions.
>
> Keep up the good work!
>
> BR, Lari
>
> On Wed, Feb 3, 2021 at 9:07 PM Lari Hotari <la...@sagire.fi> wrote:
>
> > Hi all,
> >
> > Here's the next batch of flaky test issues:
> >
> > #9459 Flaky-test: PulsarFunctionsTest.testDebeziumPostgreSqlSource
> > <https://github.com/apache/pulsar/issues/9459>
> >
> > #9458 Flaky-test: ReplicatorTest.testReplication
> > <https://github.com/apache/pulsar/issues/9458>
> >
> > #9457 Flaky-test:ReplicatorTest.testReplicatorOnPartitionedTopic
> > <https://github.com/apache/pulsar/issues/9457>
> >
> > #9456 Flaky-test: TestProxy <
> https://github.com/apache/pulsar/issues/9456>
> >
> > #9455 Flaky-test: PulsarFunctionsTest.testCustomSerdeFunction
> > <https://github.com/apache/pulsar/issues/9455>
> >
> > #9454 Flaky-test: CLITest.testCreateSubscriptionCommand
> > <https://github.com/apache/pulsar/issues/9454>
> >
> > #9453 Flaky-test: PulsarFunctionsProcessTest.testAvroSchemaFunction
> > <https://github.com/apache/pulsar/issues/9453>
> >
> > #9452 Flaky-test: org.apache.pulsar.tests.integration.SmokeTest.setup
> > <https://github.com/apache/pulsar/issues/9452>
> >
> > #9451 Flaky-test:
> > SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> > <https://github.com/apache/pulsar/issues/9451> #9450 Flaky-test:
> > org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
> > <https://github.com/apache/pulsar/issues/9450>
> >
> > The ReplicatorTest (
> >
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ReplicatorTest.java
> )
> > is contributing to a lot of failures, here's a complete list of example
> > failures:
> https://gist.github.com/lhotari/ff58a94ef42bc6ed41165ed10c7d1cfd
> > . It would be one of the fixes that would have really great impact. I
> filed
> > 2 issues about ReplicatorTest.
> >
> > Keep up the good work in fixing flaky tests. There's again a lot of great
> > contributions. Thank you!
> >
> > BR, Lari
> >
> >
> >
> >
> > On Wed, Feb 3, 2021 at 6:35 AM Lari Hotari <la...@sagire.fi>
> wrote:
> >
> >> Hi all,
> >>
> >> There are links to recent failures of a particular flaky test in the
> >> recently reported flaky test GitHub issues (
> >>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aissue+is%3Aopen
> >> ).
> >>
> >> Example from https://github.com/apache/pulsar/issues/9437 :
> >> example failure 2021-02-01T09:41:10.0922161Z
> >> <
> https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#step:6:12322
> >
> >> example failure 2021-01-29T07:51:57.9989389Z
> >> <
> https://github.com/apache/pulsar/runs/1789838309?check_suite_focus=true#step:6:18491
> >
> >> example failure 2021-01-28T02:42:14.3316285Z
> >> <
> https://github.com/apache/pulsar/runs/1781184081?check_suite_focus=true#step:6:18415
> >
> >> example failure 2021-01-27T21:44:09.7619772Z
> >> <
> https://github.com/apache/pulsar/runs/1778470820?check_suite_focus=true#step:6:6213
> >
> >>
> >> These links point to the exact line in the build log.
> >> For example:
> >>
> https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#
> >> *step:6:12322*
> >>
> >> *When opening this link, it should navigate directly to the line number
> >> 12322 in step 6 of the workflow run log.*
> >>
> >> However, there's a bug in the GitHub UI, that this doesn't work if
> >> the link is clicked from a page within github.com .
> >> The parameters and hash of the URL get lost and the focus doesn't go to
> >> the line where the error happened.
> >>
> >> *The workaround is to open the "example failure" links in a new
> >> tab/window by CTRL-click (Windows, Linux) or CMD-click (macOS).*
> >>
> >> I hope this helps investigate the flaky test failures more efficiently!
> >>
> >> BR,
> >>
> >> Lari
> >>
> >> On Tue, Feb 2, 2021 at 7:35 PM Lari Hotari <la...@sagire.fi>
> wrote:
> >>
> >>> The good progress continues!
> >>> One way to see the issue & PR activity where "flaky" is mentioned:
> >>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc
> >>> Thank you to the contributors and PR reviewers!
> >>>
> >>> Here's the next flaky test for someone to fix:
> >>> https://github.com/apache/pulsar/issues/6646 (reported a long time
> ago,
> >>> I added some example of recent failures)
> >>> It's about PulsarFunctionsTest. This test class contributes to a lot of
> >>> failures. I have uploaded a list of failures to
> >>> https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 .
> >>> I haven't validated that all failures are from flaky test runs. It's
> >>> possible that some are from a build which broke the test.
> >>>
> >>> 1) Who could pick up fixing the multiple issues in PulsarFunctionsTest,
> >>> https://github.com/apache/pulsar/issues/6646 ? You can comment
> directly
> >>> on issue #6646 and start working on it if you wish. It would be a
> really
> >>> important fix to have.
> >>>
> >>> 2) Another one: https://github.com/apache/pulsar/issues/9431
> >>>
> >>> 3) The 3rd one might be a quick fix, it's a NPE in cleanup:
> >>> https://github.com/apache/pulsar/issues/9432
> >>>
> >>> I'm looking for the sprinting to continue. It seems that the issues get
> >>> fixed sooner than I can report more of them. :)
> >>>
> >>> BR, Lari
> >>>
> >>>
> >>> On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <la...@sagire.fi>
> >>> wrote:
> >>>
> >>>> Dear Pulsar community members,
> >>>>
> >>>> Thanks for picking up the work so quickly! I noticed that at least
> >>>> Renkai and Michael already pushed pull requests to fix the flaky
> tests that
> >>>> were mentioned in the previous email. Some of the PRs have already
> been
> >>>> merged.
> >>>>
> >>>> Here are 3 more flaky tests with links to a lot of example failures:
> >>>> https://github.com/apache/pulsar/issues/9407
> >>>> https://github.com/apache/pulsar/issues/9408
> >>>> https://github.com/apache/pulsar/issues/9409
> >>>>
> >>>> I'll report more flaky tests tomorrow. Today I was working on some
> >>>> tooling to mine the logs and gather some statistics.
> >>>>
> >>>> I parsed the logs of the few last days and these are the test methods
> >>>> that fail the most:
> >>>>
> >>>> 273
> >>>> org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete
> >>>> 102     org.apache.pulsar.compaction.CompactionTest.cleanup
> >>>> 81      org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics
> >>>> 51
> >>>>
> org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection
> >>>> 45      org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown
> >>>> 40
> >>>>
> org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions
> >>>> 36
> >>>>  org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup
> >>>> 30
> >>>>
> org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown
> >>>> 30
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction
> >>>> 29
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction
> >>>> 27
> >>>>
> org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> >>>> 26
> >>>>
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3
> >>>> 22
> >>>>
> org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
> >>>> 22
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction
> >>>> 21      org.apache.pulsar.tests.integration.SmokeTest.setup
> >>>> 20
> >>>>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection
> >>>> 20
> >>>>
> org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability
> >>>> 19
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun
> >>>> 19
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction
> >>>> 14
> >>>>
> org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> >>>> 14
> >>>>
> org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting
> >>>> 14
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest
> >>>> 13
> >>>>
> org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck
> >>>> 12
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction
> >>>> 12      org.apache.pulsar.compaction.CompactorTest.cleanup
> >>>> 12
> >>>>
> org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest
> >>>> 12
> >>>>  org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup
> >>>> 12
> >>>>
> org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup
> >>>> 12
> >>>>
> org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription
> >>>> 11
> >>>>
> org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction
> >>>> 11
> >>>>
> org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup
> >>>>
> >>>> I'll report more flaky tests after I have checked that my tooling is
> >>>> producing correct results.
> >>>>
> >>>> For contributing to fix flaky tests, please pick a flaky test for
> >>>> fixing from the reported ones:
> >>>>
> >>>>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
> >>>>
> >>>> We can all join the #testing channel on Pulsar Slack to share detailed
> >>>> tips and tricks while working on fixing flaky tests.
> >>>>
> >>>> See you,
> >>>>
> >>>> BR, Lari
> >>>>
> >>>>
> >>>> On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <la...@sagire.fi>
> >>>> wrote:
> >>>>
> >>>>> Dear Pulsar community members,
> >>>>>
> >>>>> In order to improve our CI, we will have to fix the flaky tests. In
> >>>>> some cases it might be necessary to replace an existing test with a
> >>>>> redesigned test.
> >>>>>
> >>>>> The draft PIP "Changes to flaky test handling" document
> >>>>> <
> https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing>
> lists
> >>>>> the top 10 flaky tests. A lot of them have already been address by
> pull
> >>>>> requests in the past week or so.
> >>>>>
> >>>>> This is the list of recent PRs that fix flaky tests from the top 10
> >>>>> flaky tests list:
> >>>>> https://github.com/apache/pulsar/pull/9286
> >>>>> https://github.com/apache/pulsar/pull/9243
> >>>>> https://github.com/apache/pulsar/pull/9258
> >>>>> https://github.com/apache/pulsar/pull/9356
> >>>>>
> >>>>> These are the GH issues for the remaining ones in the top 10 flaky
> >>>>> tests list:
> >>>>> https://github.com/apache/pulsar/issues/6368
> >>>>> https://github.com/apache/pulsar/issues/9369
> >>>>> https://github.com/apache/pulsar/issues/9368
> >>>>>
> >>>>> If you would like to help to fix flaky tests you can pick one of the
> >>>>> open issues above. Just add a comment on the issue when you start
> working
> >>>>> on it so that we can coordinate activities.
> >>>>>
> >>>>> It is also helpful to report a flaky test when you encounter one.
> I've
> >>>>> been using this type of template for reporting a flaky test:
> >>>>> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 .
> >>>>> The issues #9368 and #9369 have been reported using this template.
> >>>>> Search for the test name before reporting so that we don't end up
> with
> >>>>> duplicates.
> >>>>>
> >>>>> The issues #6368, #9369 and #9368 are the 3 next important issues to
> >>>>> fix. I'm planning to create a more extensive list of the flaky
> failures so
> >>>>> that we can target the most flaky ones when we continue fixing the
> flaky
> >>>>> tests. I have some scripts in development to assist in mining the
> Pulsar
> >>>>> Github Action workflow run logs.
> >>>>>
> >>>>> This is a search to find flaky issues in Pulsar GH issues:
> >>>>>
> >>>>>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
> >>>>>
> >>>>> Looking forward to the contributions for fixing flaky tests,
> >>>>>
> >>>>> BR,
> >>>>>
> >>>>> Lari
> >>>>>
> >>>>
>


-- 
*Thanks*

*Yuvaraj L*

Re: Fixing flaky tests: help needed

Posted by Lari Hotari <la...@sagire.fi>.
Hi all,

There has been some great progress in fixing the flaky tests. It seems that
there's more stability in the builds after more fixes have been merged to
master.
This work has an impact. Thank you for the contributions.

Our work is not over. There's a lot more to fix. Please continue
contributing to make Pulsar CI better.

Here's the list of open issues:
https://github.com/apache/pulsar/issues?q=is%3Aissue+is%3Aopen+Flaky-test+sort%3Aupdated-desc

As usual, please comment on the issue to assign it to yourself.
You can join Pulsar Slack's #testing channel to share tips & tricks around
fixing the flaky tests or for asking questions.

Keep up the good work!

BR, Lari

On Wed, Feb 3, 2021 at 9:07 PM Lari Hotari <la...@sagire.fi> wrote:

> Hi all,
>
> Here's the next batch of flaky test issues:
>
> #9459 Flaky-test: PulsarFunctionsTest.testDebeziumPostgreSqlSource
> <https://github.com/apache/pulsar/issues/9459>
>
> #9458 Flaky-test: ReplicatorTest.testReplication
> <https://github.com/apache/pulsar/issues/9458>
>
> #9457 Flaky-test:ReplicatorTest.testReplicatorOnPartitionedTopic
> <https://github.com/apache/pulsar/issues/9457>
>
> #9456 Flaky-test: TestProxy <https://github.com/apache/pulsar/issues/9456>
>
> #9455 Flaky-test: PulsarFunctionsTest.testCustomSerdeFunction
> <https://github.com/apache/pulsar/issues/9455>
>
> #9454 Flaky-test: CLITest.testCreateSubscriptionCommand
> <https://github.com/apache/pulsar/issues/9454>
>
> #9453 Flaky-test: PulsarFunctionsProcessTest.testAvroSchemaFunction
> <https://github.com/apache/pulsar/issues/9453>
>
> #9452 Flaky-test: org.apache.pulsar.tests.integration.SmokeTest.setup
> <https://github.com/apache/pulsar/issues/9452>
>
> #9451 Flaky-test:
> SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> <https://github.com/apache/pulsar/issues/9451> #9450 Flaky-test:
> org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
> <https://github.com/apache/pulsar/issues/9450>
>
> The ReplicatorTest (
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ReplicatorTest.java)
> is contributing to a lot of failures, here's a complete list of example
> failures: https://gist.github.com/lhotari/ff58a94ef42bc6ed41165ed10c7d1cfd
> . It would be one of the fixes that would have really great impact. I filed
> 2 issues about ReplicatorTest.
>
> Keep up the good work in fixing flaky tests. There's again a lot of great
> contributions. Thank you!
>
> BR, Lari
>
>
>
>
> On Wed, Feb 3, 2021 at 6:35 AM Lari Hotari <la...@sagire.fi> wrote:
>
>> Hi all,
>>
>> There are links to recent failures of a particular flaky test in the
>> recently reported flaky test GitHub issues (
>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aissue+is%3Aopen
>> ).
>>
>> Example from https://github.com/apache/pulsar/issues/9437 :
>> example failure 2021-02-01T09:41:10.0922161Z
>> <https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#step:6:12322>
>> example failure 2021-01-29T07:51:57.9989389Z
>> <https://github.com/apache/pulsar/runs/1789838309?check_suite_focus=true#step:6:18491>
>> example failure 2021-01-28T02:42:14.3316285Z
>> <https://github.com/apache/pulsar/runs/1781184081?check_suite_focus=true#step:6:18415>
>> example failure 2021-01-27T21:44:09.7619772Z
>> <https://github.com/apache/pulsar/runs/1778470820?check_suite_focus=true#step:6:6213>
>>
>> These links point to the exact line in the build log.
>> For example:
>> https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#
>> *step:6:12322*
>>
>> *When opening this link, it should navigate directly to the line number
>> 12322 in step 6 of the workflow run log.*
>>
>> However, there's a bug in the GitHub UI, that this doesn't work if
>> the link is clicked from a page within github.com .
>> The parameters and hash of the URL get lost and the focus doesn't go to
>> the line where the error happened.
>>
>> *The workaround is to open the "example failure" links in a new
>> tab/window by CTRL-click (Windows, Linux) or CMD-click (macOS).*
>>
>> I hope this helps investigate the flaky test failures more efficiently!
>>
>> BR,
>>
>> Lari
>>
>> On Tue, Feb 2, 2021 at 7:35 PM Lari Hotari <la...@sagire.fi> wrote:
>>
>>> The good progress continues!
>>> One way to see the issue & PR activity where "flaky" is mentioned:
>>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc
>>> Thank you to the contributors and PR reviewers!
>>>
>>> Here's the next flaky test for someone to fix:
>>> https://github.com/apache/pulsar/issues/6646 (reported a long time ago,
>>> I added some example of recent failures)
>>> It's about PulsarFunctionsTest. This test class contributes to a lot of
>>> failures. I have uploaded a list of failures to
>>> https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 .
>>> I haven't validated that all failures are from flaky test runs. It's
>>> possible that some are from a build which broke the test.
>>>
>>> 1) Who could pick up fixing the multiple issues in PulsarFunctionsTest,
>>> https://github.com/apache/pulsar/issues/6646 ? You can comment directly
>>> on issue #6646 and start working on it if you wish. It would be a really
>>> important fix to have.
>>>
>>> 2) Another one: https://github.com/apache/pulsar/issues/9431
>>>
>>> 3) The 3rd one might be a quick fix, it's a NPE in cleanup:
>>> https://github.com/apache/pulsar/issues/9432
>>>
>>> I'm looking for the sprinting to continue. It seems that the issues get
>>> fixed sooner than I can report more of them. :)
>>>
>>> BR, Lari
>>>
>>>
>>> On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <la...@sagire.fi>
>>> wrote:
>>>
>>>> Dear Pulsar community members,
>>>>
>>>> Thanks for picking up the work so quickly! I noticed that at least
>>>> Renkai and Michael already pushed pull requests to fix the flaky tests that
>>>> were mentioned in the previous email. Some of the PRs have already been
>>>> merged.
>>>>
>>>> Here are 3 more flaky tests with links to a lot of example failures:
>>>> https://github.com/apache/pulsar/issues/9407
>>>> https://github.com/apache/pulsar/issues/9408
>>>> https://github.com/apache/pulsar/issues/9409
>>>>
>>>> I'll report more flaky tests tomorrow. Today I was working on some
>>>> tooling to mine the logs and gather some statistics.
>>>>
>>>> I parsed the logs of the few last days and these are the test methods
>>>> that fail the most:
>>>>
>>>> 273
>>>> org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete
>>>> 102     org.apache.pulsar.compaction.CompactionTest.cleanup
>>>> 81      org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics
>>>> 51
>>>>  org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection
>>>> 45      org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown
>>>> 40
>>>>  org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions
>>>> 36
>>>>  org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup
>>>> 30
>>>>  org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown
>>>> 30
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction
>>>> 29
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction
>>>> 27
>>>>  org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
>>>> 26
>>>>  org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3
>>>> 22
>>>>  org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
>>>> 22
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction
>>>> 21      org.apache.pulsar.tests.integration.SmokeTest.setup
>>>> 20
>>>>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection
>>>> 20
>>>>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability
>>>> 19
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun
>>>> 19
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction
>>>> 14
>>>>  org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
>>>> 14
>>>>  org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting
>>>> 14
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest
>>>> 13
>>>>  org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility
>>>> 12
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction
>>>> 12
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction
>>>> 12
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps
>>>> 12
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction
>>>> 12
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck
>>>> 12
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction
>>>> 12      org.apache.pulsar.compaction.CompactorTest.cleanup
>>>> 12
>>>>  org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest
>>>> 12
>>>>  org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup
>>>> 12
>>>>  org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup
>>>> 12
>>>>  org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription
>>>> 11
>>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction
>>>> 11
>>>>  org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup
>>>>
>>>> I'll report more flaky tests after I have checked that my tooling is
>>>> producing correct results.
>>>>
>>>> For contributing to fix flaky tests, please pick a flaky test for
>>>> fixing from the reported ones:
>>>>
>>>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>>>>
>>>> We can all join the #testing channel on Pulsar Slack to share detailed
>>>> tips and tricks while working on fixing flaky tests.
>>>>
>>>> See you,
>>>>
>>>> BR, Lari
>>>>
>>>>
>>>> On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <la...@sagire.fi>
>>>> wrote:
>>>>
>>>>> Dear Pulsar community members,
>>>>>
>>>>> In order to improve our CI, we will have to fix the flaky tests. In
>>>>> some cases it might be necessary to replace an existing test with a
>>>>> redesigned test.
>>>>>
>>>>> The draft PIP "Changes to flaky test handling" document
>>>>> <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing> lists
>>>>> the top 10 flaky tests. A lot of them have already been address by pull
>>>>> requests in the past week or so.
>>>>>
>>>>> This is the list of recent PRs that fix flaky tests from the top 10
>>>>> flaky tests list:
>>>>> https://github.com/apache/pulsar/pull/9286
>>>>> https://github.com/apache/pulsar/pull/9243
>>>>> https://github.com/apache/pulsar/pull/9258
>>>>> https://github.com/apache/pulsar/pull/9356
>>>>>
>>>>> These are the GH issues for the remaining ones in the top 10 flaky
>>>>> tests list:
>>>>> https://github.com/apache/pulsar/issues/6368
>>>>> https://github.com/apache/pulsar/issues/9369
>>>>> https://github.com/apache/pulsar/issues/9368
>>>>>
>>>>> If you would like to help to fix flaky tests you can pick one of the
>>>>> open issues above. Just add a comment on the issue when you start working
>>>>> on it so that we can coordinate activities.
>>>>>
>>>>> It is also helpful to report a flaky test when you encounter one. I've
>>>>> been using this type of template for reporting a flaky test:
>>>>> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 .
>>>>> The issues #9368 and #9369 have been reported using this template.
>>>>> Search for the test name before reporting so that we don't end up with
>>>>> duplicates.
>>>>>
>>>>> The issues #6368, #9369 and #9368 are the 3 next important issues to
>>>>> fix. I'm planning to create a more extensive list of the flaky failures so
>>>>> that we can target the most flaky ones when we continue fixing the flaky
>>>>> tests. I have some scripts in development to assist in mining the Pulsar
>>>>> Github Action workflow run logs.
>>>>>
>>>>> This is a search to find flaky issues in Pulsar GH issues:
>>>>>
>>>>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>>>>>
>>>>> Looking forward to the contributions for fixing flaky tests,
>>>>>
>>>>> BR,
>>>>>
>>>>> Lari
>>>>>
>>>>

Re: Fixing flaky tests: help needed

Posted by Lari Hotari <la...@sagire.fi>.
Hi all,

Here's the next batch of flaky test issues:

#9459 Flaky-test: PulsarFunctionsTest.testDebeziumPostgreSqlSource
<https://github.com/apache/pulsar/issues/9459>

#9458 Flaky-test: ReplicatorTest.testReplication
<https://github.com/apache/pulsar/issues/9458>

#9457 Flaky-test:ReplicatorTest.testReplicatorOnPartitionedTopic
<https://github.com/apache/pulsar/issues/9457>

#9456 Flaky-test: TestProxy <https://github.com/apache/pulsar/issues/9456>

#9455 Flaky-test: PulsarFunctionsTest.testCustomSerdeFunction
<https://github.com/apache/pulsar/issues/9455>

#9454 Flaky-test: CLITest.testCreateSubscriptionCommand
<https://github.com/apache/pulsar/issues/9454>

#9453 Flaky-test: PulsarFunctionsProcessTest.testAvroSchemaFunction
<https://github.com/apache/pulsar/issues/9453>

#9452 Flaky-test: org.apache.pulsar.tests.integration.SmokeTest.setup
<https://github.com/apache/pulsar/issues/9452>

#9451 Flaky-test:
SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
<https://github.com/apache/pulsar/issues/9451> #9450 Flaky-test:
org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
<https://github.com/apache/pulsar/issues/9450>

The ReplicatorTest (
https://github.com/apache/pulsar/blob/master/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ReplicatorTest.java)
is contributing to a lot of failures, here's a complete list of example
failures: https://gist.github.com/lhotari/ff58a94ef42bc6ed41165ed10c7d1cfd
. It would be one of the fixes that would have really great impact. I filed
2 issues about ReplicatorTest.

Keep up the good work in fixing flaky tests. There's again a lot of great
contributions. Thank you!

BR, Lari




On Wed, Feb 3, 2021 at 6:35 AM Lari Hotari <la...@sagire.fi> wrote:

> Hi all,
>
> There are links to recent failures of a particular flaky test in the
> recently reported flaky test GitHub issues (
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aissue+is%3Aopen
> ).
>
> Example from https://github.com/apache/pulsar/issues/9437 :
> example failure 2021-02-01T09:41:10.0922161Z
> <https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#step:6:12322>
> example failure 2021-01-29T07:51:57.9989389Z
> <https://github.com/apache/pulsar/runs/1789838309?check_suite_focus=true#step:6:18491>
> example failure 2021-01-28T02:42:14.3316285Z
> <https://github.com/apache/pulsar/runs/1781184081?check_suite_focus=true#step:6:18415>
> example failure 2021-01-27T21:44:09.7619772Z
> <https://github.com/apache/pulsar/runs/1778470820?check_suite_focus=true#step:6:6213>
>
> These links point to the exact line in the build log.
> For example:
> https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#
> *step:6:12322*
>
> *When opening this link, it should navigate directly to the line number
> 12322 in step 6 of the workflow run log.*
>
> However, there's a bug in the GitHub UI, that this doesn't work if
> the link is clicked from a page within github.com .
> The parameters and hash of the URL get lost and the focus doesn't go to
> the line where the error happened.
>
> *The workaround is to open the "example failure" links in a new tab/window
> by CTRL-click (Windows, Linux) or CMD-click (macOS).*
>
> I hope this helps investigate the flaky test failures more efficiently!
>
> BR,
>
> Lari
>
> On Tue, Feb 2, 2021 at 7:35 PM Lari Hotari <la...@sagire.fi> wrote:
>
>> The good progress continues!
>> One way to see the issue & PR activity where "flaky" is mentioned:
>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc
>> Thank you to the contributors and PR reviewers!
>>
>> Here's the next flaky test for someone to fix:
>> https://github.com/apache/pulsar/issues/6646 (reported a long time ago,
>> I added some example of recent failures)
>> It's about PulsarFunctionsTest. This test class contributes to a lot of
>> failures. I have uploaded a list of failures to
>> https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 .
>> I haven't validated that all failures are from flaky test runs. It's
>> possible that some are from a build which broke the test.
>>
>> 1) Who could pick up fixing the multiple issues in PulsarFunctionsTest,
>> https://github.com/apache/pulsar/issues/6646 ? You can comment directly
>> on issue #6646 and start working on it if you wish. It would be a really
>> important fix to have.
>>
>> 2) Another one: https://github.com/apache/pulsar/issues/9431
>>
>> 3) The 3rd one might be a quick fix, it's a NPE in cleanup:
>> https://github.com/apache/pulsar/issues/9432
>>
>> I'm looking for the sprinting to continue. It seems that the issues get
>> fixed sooner than I can report more of them. :)
>>
>> BR, Lari
>>
>>
>> On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <la...@sagire.fi> wrote:
>>
>>> Dear Pulsar community members,
>>>
>>> Thanks for picking up the work so quickly! I noticed that at least
>>> Renkai and Michael already pushed pull requests to fix the flaky tests that
>>> were mentioned in the previous email. Some of the PRs have already been
>>> merged.
>>>
>>> Here are 3 more flaky tests with links to a lot of example failures:
>>> https://github.com/apache/pulsar/issues/9407
>>> https://github.com/apache/pulsar/issues/9408
>>> https://github.com/apache/pulsar/issues/9409
>>>
>>> I'll report more flaky tests tomorrow. Today I was working on some
>>> tooling to mine the logs and gather some statistics.
>>>
>>> I parsed the logs of the few last days and these are the test methods
>>> that fail the most:
>>>
>>> 273
>>> org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete
>>> 102     org.apache.pulsar.compaction.CompactionTest.cleanup
>>> 81      org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics
>>> 51
>>>  org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection
>>> 45      org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown
>>> 40
>>>  org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions
>>> 36      org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup
>>> 30
>>>  org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown
>>> 30
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction
>>> 29
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction
>>> 27
>>>  org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
>>> 26
>>>  org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3
>>> 22
>>>  org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
>>> 22
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction
>>> 21      org.apache.pulsar.tests.integration.SmokeTest.setup
>>> 20
>>>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection
>>> 20
>>>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability
>>> 19
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun
>>> 19
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction
>>> 14
>>>  org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
>>> 14
>>>  org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting
>>> 14
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest
>>> 13
>>>  org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility
>>> 12
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction
>>> 12
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction
>>> 12
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps
>>> 12
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction
>>> 12
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck
>>> 12
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction
>>> 12      org.apache.pulsar.compaction.CompactorTest.cleanup
>>> 12
>>>  org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest
>>> 12      org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup
>>> 12
>>>  org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup
>>> 12
>>>  org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription
>>> 11
>>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction
>>> 11
>>>  org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup
>>>
>>> I'll report more flaky tests after I have checked that my tooling is
>>> producing correct results.
>>>
>>> For contributing to fix flaky tests, please pick a flaky test for fixing
>>> from the reported ones:
>>>
>>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>>>
>>> We can all join the #testing channel on Pulsar Slack to share detailed
>>> tips and tricks while working on fixing flaky tests.
>>>
>>> See you,
>>>
>>> BR, Lari
>>>
>>>
>>> On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <la...@sagire.fi>
>>> wrote:
>>>
>>>> Dear Pulsar community members,
>>>>
>>>> In order to improve our CI, we will have to fix the flaky tests. In
>>>> some cases it might be necessary to replace an existing test with a
>>>> redesigned test.
>>>>
>>>> The draft PIP "Changes to flaky test handling" document
>>>> <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing> lists
>>>> the top 10 flaky tests. A lot of them have already been address by pull
>>>> requests in the past week or so.
>>>>
>>>> This is the list of recent PRs that fix flaky tests from the top 10
>>>> flaky tests list:
>>>> https://github.com/apache/pulsar/pull/9286
>>>> https://github.com/apache/pulsar/pull/9243
>>>> https://github.com/apache/pulsar/pull/9258
>>>> https://github.com/apache/pulsar/pull/9356
>>>>
>>>> These are the GH issues for the remaining ones in the top 10 flaky
>>>> tests list:
>>>> https://github.com/apache/pulsar/issues/6368
>>>> https://github.com/apache/pulsar/issues/9369
>>>> https://github.com/apache/pulsar/issues/9368
>>>>
>>>> If you would like to help to fix flaky tests you can pick one of the
>>>> open issues above. Just add a comment on the issue when you start working
>>>> on it so that we can coordinate activities.
>>>>
>>>> It is also helpful to report a flaky test when you encounter one. I've
>>>> been using this type of template for reporting a flaky test:
>>>> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The
>>>> issues #9368 and #9369 have been reported using this template.
>>>> Search for the test name before reporting so that we don't end up with
>>>> duplicates.
>>>>
>>>> The issues #6368, #9369 and #9368 are the 3 next important issues to
>>>> fix. I'm planning to create a more extensive list of the flaky failures so
>>>> that we can target the most flaky ones when we continue fixing the flaky
>>>> tests. I have some scripts in development to assist in mining the Pulsar
>>>> Github Action workflow run logs.
>>>>
>>>> This is a search to find flaky issues in Pulsar GH issues:
>>>>
>>>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>>>>
>>>> Looking forward to the contributions for fixing flaky tests,
>>>>
>>>> BR,
>>>>
>>>> Lari
>>>>
>>>

Re: Fixing flaky tests: help needed

Posted by Lari Hotari <la...@sagire.fi>.
Hi all,

There are links to recent failures of a particular flaky test in the
recently reported flaky test GitHub issues (
https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aissue+is%3Aopen
).

Example from https://github.com/apache/pulsar/issues/9437 :
example failure 2021-02-01T09:41:10.0922161Z
<https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#step:6:12322>
example failure 2021-01-29T07:51:57.9989389Z
<https://github.com/apache/pulsar/runs/1789838309?check_suite_focus=true#step:6:18491>
example failure 2021-01-28T02:42:14.3316285Z
<https://github.com/apache/pulsar/runs/1781184081?check_suite_focus=true#step:6:18415>
example failure 2021-01-27T21:44:09.7619772Z
<https://github.com/apache/pulsar/runs/1778470820?check_suite_focus=true#step:6:6213>

These links point to the exact line in the build log.
For example:
https://github.com/apache/pulsar/runs/1804628430?check_suite_focus=true#
*step:6:12322*

*When opening this link, it should navigate directly to the line number
12322 in step 6 of the workflow run log.*

However, there's a bug in the GitHub UI, that this doesn't work if the link
is clicked from a page within github.com .
The parameters and hash of the URL get lost and the focus doesn't go to the
line where the error happened.

*The workaround is to open the "example failure" links in a new tab/window
by CTRL-click (Windows, Linux) or CMD-click (macOS).*

I hope this helps investigate the flaky test failures more efficiently!

BR,

Lari

On Tue, Feb 2, 2021 at 7:35 PM Lari Hotari <la...@sagire.fi> wrote:

> The good progress continues!
> One way to see the issue & PR activity where "flaky" is mentioned:
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc
> Thank you to the contributors and PR reviewers!
>
> Here's the next flaky test for someone to fix:
> https://github.com/apache/pulsar/issues/6646 (reported a long time ago, I
> added some example of recent failures)
> It's about PulsarFunctionsTest. This test class contributes to a lot of
> failures. I have uploaded a list of failures to
> https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 .
> I haven't validated that all failures are from flaky test runs. It's
> possible that some are from a build which broke the test.
>
> 1) Who could pick up fixing the multiple issues in PulsarFunctionsTest,
> https://github.com/apache/pulsar/issues/6646 ? You can comment directly
> on issue #6646 and start working on it if you wish. It would be a really
> important fix to have.
>
> 2) Another one: https://github.com/apache/pulsar/issues/9431
>
> 3) The 3rd one might be a quick fix, it's a NPE in cleanup:
> https://github.com/apache/pulsar/issues/9432
>
> I'm looking for the sprinting to continue. It seems that the issues get
> fixed sooner than I can report more of them. :)
>
> BR, Lari
>
>
> On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <la...@sagire.fi> wrote:
>
>> Dear Pulsar community members,
>>
>> Thanks for picking up the work so quickly! I noticed that at least Renkai
>> and Michael already pushed pull requests to fix the flaky tests that were
>> mentioned in the previous email. Some of the PRs have already been merged.
>>
>> Here are 3 more flaky tests with links to a lot of example failures:
>> https://github.com/apache/pulsar/issues/9407
>> https://github.com/apache/pulsar/issues/9408
>> https://github.com/apache/pulsar/issues/9409
>>
>> I'll report more flaky tests tomorrow. Today I was working on some
>> tooling to mine the logs and gather some statistics.
>>
>> I parsed the logs of the few last days and these are the test methods
>> that fail the most:
>>
>> 273     org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete
>> 102     org.apache.pulsar.compaction.CompactionTest.cleanup
>> 81      org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics
>> 51
>>  org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection
>> 45      org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown
>> 40
>>  org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions
>> 36      org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup
>> 30
>>  org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown
>> 30
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction
>> 29
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction
>> 27
>>  org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
>> 26
>>  org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3
>> 22
>>  org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
>> 22
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction
>> 21      org.apache.pulsar.tests.integration.SmokeTest.setup
>> 20
>>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection
>> 20
>>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability
>> 19
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun
>> 19
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction
>> 14
>>  org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
>> 14
>>  org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting
>> 14
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest
>> 13
>>  org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility
>> 12
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction
>> 12
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction
>> 12
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps
>> 12
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction
>> 12
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck
>> 12
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction
>> 12      org.apache.pulsar.compaction.CompactorTest.cleanup
>> 12
>>  org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest
>> 12      org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup
>> 12
>>  org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup
>> 12
>>  org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription
>> 11
>>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction
>> 11
>>  org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup
>>
>> I'll report more flaky tests after I have checked that my tooling is
>> producing correct results.
>>
>> For contributing to fix flaky tests, please pick a flaky test for fixing
>> from the reported ones:
>>
>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>>
>> We can all join the #testing channel on Pulsar Slack to share detailed
>> tips and tricks while working on fixing flaky tests.
>>
>> See you,
>>
>> BR, Lari
>>
>>
>> On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <la...@sagire.fi>
>> wrote:
>>
>>> Dear Pulsar community members,
>>>
>>> In order to improve our CI, we will have to fix the flaky tests. In some
>>> cases it might be necessary to replace an existing test with a redesigned
>>> test.
>>>
>>> The draft PIP "Changes to flaky test handling" document
>>> <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing> lists
>>> the top 10 flaky tests. A lot of them have already been address by pull
>>> requests in the past week or so.
>>>
>>> This is the list of recent PRs that fix flaky tests from the top 10
>>> flaky tests list:
>>> https://github.com/apache/pulsar/pull/9286
>>> https://github.com/apache/pulsar/pull/9243
>>> https://github.com/apache/pulsar/pull/9258
>>> https://github.com/apache/pulsar/pull/9356
>>>
>>> These are the GH issues for the remaining ones in the top 10 flaky tests
>>> list:
>>> https://github.com/apache/pulsar/issues/6368
>>> https://github.com/apache/pulsar/issues/9369
>>> https://github.com/apache/pulsar/issues/9368
>>>
>>> If you would like to help to fix flaky tests you can pick one of the
>>> open issues above. Just add a comment on the issue when you start working
>>> on it so that we can coordinate activities.
>>>
>>> It is also helpful to report a flaky test when you encounter one. I've
>>> been using this type of template for reporting a flaky test:
>>> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The
>>> issues #9368 and #9369 have been reported using this template.
>>> Search for the test name before reporting so that we don't end up with
>>> duplicates.
>>>
>>> The issues #6368, #9369 and #9368 are the 3 next important issues to
>>> fix. I'm planning to create a more extensive list of the flaky failures so
>>> that we can target the most flaky ones when we continue fixing the flaky
>>> tests. I have some scripts in development to assist in mining the Pulsar
>>> Github Action workflow run logs.
>>>
>>> This is a search to find flaky issues in Pulsar GH issues:
>>>
>>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>>>
>>> Looking forward to the contributions for fixing flaky tests,
>>>
>>> BR,
>>>
>>> Lari
>>>
>>

Re: Fixing flaky tests: help needed

Posted by Lari Hotari <la...@sagire.fi>.
The good progress continues!
One way to see the issue & PR activity where "flaky" is mentioned:
https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc
Thank you to the contributors and PR reviewers!

Here's the next flaky test for someone to fix:
https://github.com/apache/pulsar/issues/6646 (reported a long time ago, I
added some example of recent failures)
It's about PulsarFunctionsTest. This test class contributes to a lot of
failures. I have uploaded a list of failures to
https://gist.github.com/lhotari/9bae3e16674c297a6bbc2b4831515a74 .
I haven't validated that all failures are from flaky test runs. It's
possible that some are from a build which broke the test.

1) Who could pick up fixing the multiple issues in PulsarFunctionsTest,
https://github.com/apache/pulsar/issues/6646 ? You can comment directly on
issue #6646 and start working on it if you wish. It would be a really
important fix to have.

2) Another one: https://github.com/apache/pulsar/issues/9431

3) The 3rd one might be a quick fix, it's a NPE in cleanup:
https://github.com/apache/pulsar/issues/9432

I'm looking for the sprinting to continue. It seems that the issues get
fixed sooner than I can report more of them. :)

BR, Lari


On Mon, Feb 1, 2021 at 8:18 PM Lari Hotari <la...@sagire.fi> wrote:

> Dear Pulsar community members,
>
> Thanks for picking up the work so quickly! I noticed that at least Renkai
> and Michael already pushed pull requests to fix the flaky tests that were
> mentioned in the previous email. Some of the PRs have already been merged.
>
> Here are 3 more flaky tests with links to a lot of example failures:
> https://github.com/apache/pulsar/issues/9407
> https://github.com/apache/pulsar/issues/9408
> https://github.com/apache/pulsar/issues/9409
>
> I'll report more flaky tests tomorrow. Today I was working on some tooling
> to mine the logs and gather some statistics.
>
> I parsed the logs of the few last days and these are the test methods that
> fail the most:
>
> 273     org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete
> 102     org.apache.pulsar.compaction.CompactionTest.cleanup
> 81      org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics
> 51
>  org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection
> 45      org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown
> 40
>  org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions
> 36      org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup
> 30
>  org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown
> 30
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction
> 29
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction
> 27
>  org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> 26
>  org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3
> 22
>  org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
> 22
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction
> 21      org.apache.pulsar.tests.integration.SmokeTest.setup
> 20
>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection
> 20
>  org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability
> 19
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun
> 19
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction
> 14
>  org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
> 14
>  org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting
> 14
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest
> 13
>  org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility
> 12
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction
> 12
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction
> 12
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps
> 12
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction
> 12
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck
> 12
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction
> 12      org.apache.pulsar.compaction.CompactorTest.cleanup
> 12
>  org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest
> 12      org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup
> 12
>  org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup
> 12
>  org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription
> 11
>  org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction
> 11
>  org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup
>
> I'll report more flaky tests after I have checked that my tooling is
> producing correct results.
>
> For contributing to fix flaky tests, please pick a flaky test for fixing
> from the reported ones:
>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>
> We can all join the #testing channel on Pulsar Slack to share detailed
> tips and tricks while working on fixing flaky tests.
>
> See you,
>
> BR, Lari
>
>
> On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <la...@sagire.fi> wrote:
>
>> Dear Pulsar community members,
>>
>> In order to improve our CI, we will have to fix the flaky tests. In some
>> cases it might be necessary to replace an existing test with a redesigned
>> test.
>>
>> The draft PIP "Changes to flaky test handling" document
>> <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing> lists
>> the top 10 flaky tests. A lot of them have already been address by pull
>> requests in the past week or so.
>>
>> This is the list of recent PRs that fix flaky tests from the top 10 flaky
>> tests list:
>> https://github.com/apache/pulsar/pull/9286
>> https://github.com/apache/pulsar/pull/9243
>> https://github.com/apache/pulsar/pull/9258
>> https://github.com/apache/pulsar/pull/9356
>>
>> These are the GH issues for the remaining ones in the top 10 flaky tests
>> list:
>> https://github.com/apache/pulsar/issues/6368
>> https://github.com/apache/pulsar/issues/9369
>> https://github.com/apache/pulsar/issues/9368
>>
>> If you would like to help to fix flaky tests you can pick one of the open
>> issues above. Just add a comment on the issue when you start working on it
>> so that we can coordinate activities.
>>
>> It is also helpful to report a flaky test when you encounter one. I've
>> been using this type of template for reporting a flaky test:
>> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The
>> issues #9368 and #9369 have been reported using this template.
>> Search for the test name before reporting so that we don't end up with
>> duplicates.
>>
>> The issues #6368, #9369 and #9368 are the 3 next important issues to fix.
>> I'm planning to create a more extensive list of the flaky failures so that
>> we can target the most flaky ones when we continue fixing the flaky tests.
>> I have some scripts in development to assist in mining the Pulsar Github
>> Action workflow run logs.
>>
>> This is a search to find flaky issues in Pulsar GH issues:
>>
>> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>>
>> Looking forward to the contributions for fixing flaky tests,
>>
>> BR,
>>
>> Lari
>>
>

Re: Fixing flaky tests: help needed

Posted by Lari Hotari <la...@sagire.fi>.
Dear Pulsar community members,

Thanks for picking up the work so quickly! I noticed that at least Renkai
and Michael already pushed pull requests to fix the flaky tests that were
mentioned in the previous email. Some of the PRs have already been merged.

Here are 3 more flaky tests with links to a lot of example failures:
https://github.com/apache/pulsar/issues/9407
https://github.com/apache/pulsar/issues/9408
https://github.com/apache/pulsar/issues/9409

I'll report more flaky tests tomorrow. Today I was working on some tooling
to mine the logs and gather some statistics.

I parsed the logs of the few last days and these are the test methods that
fail the most:

273     org.apache.pulsar.tests.integration.utils.DockerUtils$2.onComplete
102     org.apache.pulsar.compaction.CompactionTest.cleanup
81      org.apache.pulsar.admin.cli.PulsarAdminToolTest.topics
51
 org.apache.pulsar.broker.loadbalance.LoadBalancerTest.testLeaderElection
45      org.apache.pulsar.io.PulsarFunctionE2ETest.shutdown
40
 org.apache.pulsar.broker.service.ConsumedLedgersTrimTest.testConsumedLedgersTrimNoSubscriptions
36      org.apache.pulsar.websocket.proxy.ProxyPublishConsumeTest.cleanup
30
 org.apache.pulsar.functions.worker.PulsarFunctionLocalRunTest.shutdown
30
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationTopicPatternFunction
29
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaExclamationFunction
27
 org.apache.pulsar.client.api.v1.V1_ProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
26
 org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3
22
 org.apache.pulsar.broker.service.ReplicatorTest.testResetCursorNotFail
22
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaLoggingFunction
21      org.apache.pulsar.tests.integration.SmokeTest.setup
20      org.apache.pulsar.client.impl.MessageIdTest.testChecksumReconnection
20
 org.apache.pulsar.client.impl.MessageIdTest.testChecksumVersionComptability
19
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionLocalRun
19
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testAutoSchemaFunction
14
 org.apache.pulsar.client.api.SimpleProducerConsumerTest.testConcurrentConsumerReceiveWhileReconnect
14
 org.apache.pulsar.broker.service.MessagePublishBufferThrottleTest.testBlockByPublishRateLimiting
14
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testSlidingCountWindowTest
13
 org.apache.pulsar.tests.integration.backwardscompatibility.ClientTest2_2.testResetCursorCompatibility
12
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonPublishFunction
12
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationTopicPatternFunction
12
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunctionWithExtraDeps
12
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationZipFunction
12
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonFunctionNegAck
12
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testPythonExclamationFunction
12      org.apache.pulsar.compaction.CompactorTest.cleanup
12
 org.apache.pulsar.broker.service.BrokerServiceAutoSubscriptionCreationTest.cleanupTest
12      org.apache.pulsar.websocket.proxy.ProxyAuthenticationTest.cleanup
12
 org.apache.pulsar.websocket.proxy.v1.V1_ProxyAuthenticationTest.cleanup
12
 org.apache.pulsar.client.impl.BatchMessageIndexAckTest.testBatchMessageIndexAckForSharedSubscription
11
 org.apache.pulsar.tests.integration.functions.PulsarFunctionsTest.testJavaPublishFunction
11
 org.apache.pulsar.broker.loadbalance.AntiAffinityNamespaceGroupTest.testBrokerSelectionForAntiAffinityGroup

I'll report more flaky tests after I have checked that my tooling is
producing correct results.

For contributing to fix flaky tests, please pick a flaky test for fixing
from the reported ones:
https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen

We can all join the #testing channel on Pulsar Slack to share detailed tips
and tricks while working on fixing flaky tests.

See you,

BR, Lari


On Fri, Jan 29, 2021 at 8:26 PM Lari Hotari <la...@sagire.fi> wrote:

> Dear Pulsar community members,
>
> In order to improve our CI, we will have to fix the flaky tests. In some
> cases it might be necessary to replace an existing test with a redesigned
> test.
>
> The draft PIP "Changes to flaky test handling" document
> <https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing> lists
> the top 10 flaky tests. A lot of them have already been address by pull
> requests in the past week or so.
>
> This is the list of recent PRs that fix flaky tests from the top 10 flaky
> tests list:
> https://github.com/apache/pulsar/pull/9286
> https://github.com/apache/pulsar/pull/9243
> https://github.com/apache/pulsar/pull/9258
> https://github.com/apache/pulsar/pull/9356
>
> These are the GH issues for the remaining ones in the top 10 flaky tests
> list:
> https://github.com/apache/pulsar/issues/6368
> https://github.com/apache/pulsar/issues/9369
> https://github.com/apache/pulsar/issues/9368
>
> If you would like to help to fix flaky tests you can pick one of the open
> issues above. Just add a comment on the issue when you start working on it
> so that we can coordinate activities.
>
> It is also helpful to report a flaky test when you encounter one. I've
> been using this type of template for reporting a flaky test:
> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The
> issues #9368 and #9369 have been reported using this template.
> Search for the test name before reporting so that we don't end up with
> duplicates.
>
> The issues #6368, #9369 and #9368 are the 3 next important issues to fix.
> I'm planning to create a more extensive list of the flaky failures so that
> we can target the most flaky ones when we continue fixing the flaky tests.
> I have some scripts in development to assist in mining the Pulsar Github
> Action workflow run logs.
>
> This is a search to find flaky issues in Pulsar GH issues:
>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>
> Looking forward to the contributions for fixing flaky tests,
>
> BR,
>
> Lari
>

Re: Fixing flaky tests: help needed

Posted by Lari Hotari <La...@hotari.net>.
Good idea. I have created https://github.com/apache/pulsar/pull/9398 for
adding the issue template for reporting flaky tests.

BR, Lari

On Sat, Jan 30, 2021 at 8:51 PM Sijie Guo <gu...@gmail.com> wrote:

> > I've been
> using this type of template for reporting a flaky test:
> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 .
>
>
> Can you add an issue template to Pulsar?
>
> - Sijie
>
> On Fri, Jan 29, 2021 at 10:27 AM Lari Hotari <la...@sagire.fi>
> wrote:
>
> > Dear Pulsar community members,
> >
> > In order to improve our CI, we will have to fix the flaky tests. In some
> > cases it might be necessary to replace an existing test with a redesigned
> > test.
> >
> > The draft PIP "Changes to flaky test handling" document
> > <
> >
> https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing
> > >
> > lists
> > the top 10 flaky tests. A lot of them have already been address by pull
> > requests in the past week or so.
> >
> > This is the list of recent PRs that fix flaky tests from the top 10 flaky
> > tests list:
> > https://github.com/apache/pulsar/pull/9286
> > https://github.com/apache/pulsar/pull/9243
> > https://github.com/apache/pulsar/pull/9258
> > https://github.com/apache/pulsar/pull/9356
> >
> > These are the GH issues for the remaining ones in the top 10 flaky tests
> > list:
> > https://github.com/apache/pulsar/issues/6368
> > https://github.com/apache/pulsar/issues/9369
> > https://github.com/apache/pulsar/issues/9368
> >
> > If you would like to help to fix flaky tests you can pick one of the open
> > issues above. Just add a comment on the issue when you start working on
> it
> > so that we can coordinate activities.
> >
> > It is also helpful to report a flaky test when you encounter one. I've
> been
> > using this type of template for reporting a flaky test:
> > https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The
> > issues #9368 and #9369 have been reported using this template.
> > Search for the test name before reporting so that we don't end up with
> > duplicates.
> >
> > The issues #6368, #9369 and #9368 are the 3 next important issues to fix.
> > I'm planning to create a more extensive list of the flaky failures so
> that
> > we can target the most flaky ones when we continue fixing the flaky
> tests.
> > I have some scripts in development to assist in mining the Pulsar Github
> > Action workflow run logs.
> >
> > This is a search to find flaky issues in Pulsar GH issues:
> >
> >
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
> >
> > Looking forward to the contributions for fixing flaky tests,
> >
> > BR,
> >
> > Lari
> >
>

Re: Fixing flaky tests: help needed

Posted by Sijie Guo <gu...@gmail.com>.
> I've been
using this type of template for reporting a flaky test:
https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 .


Can you add an issue template to Pulsar?

- Sijie

On Fri, Jan 29, 2021 at 10:27 AM Lari Hotari <la...@sagire.fi> wrote:

> Dear Pulsar community members,
>
> In order to improve our CI, we will have to fix the flaky tests. In some
> cases it might be necessary to replace an existing test with a redesigned
> test.
>
> The draft PIP "Changes to flaky test handling" document
> <
> https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing
> >
> lists
> the top 10 flaky tests. A lot of them have already been address by pull
> requests in the past week or so.
>
> This is the list of recent PRs that fix flaky tests from the top 10 flaky
> tests list:
> https://github.com/apache/pulsar/pull/9286
> https://github.com/apache/pulsar/pull/9243
> https://github.com/apache/pulsar/pull/9258
> https://github.com/apache/pulsar/pull/9356
>
> These are the GH issues for the remaining ones in the top 10 flaky tests
> list:
> https://github.com/apache/pulsar/issues/6368
> https://github.com/apache/pulsar/issues/9369
> https://github.com/apache/pulsar/issues/9368
>
> If you would like to help to fix flaky tests you can pick one of the open
> issues above. Just add a comment on the issue when you start working on it
> so that we can coordinate activities.
>
> It is also helpful to report a flaky test when you encounter one. I've been
> using this type of template for reporting a flaky test:
> https://gist.github.com/lhotari/a5c67359b362b4f3d8729330d65a2298 . The
> issues #9368 and #9369 have been reported using this template.
> Search for the test name before reporting so that we don't end up with
> duplicates.
>
> The issues #6368, #9369 and #9368 are the 3 next important issues to fix.
> I'm planning to create a more extensive list of the flaky failures so that
> we can target the most flaky ones when we continue fixing the flaky tests.
> I have some scripts in development to assist in mining the Pulsar Github
> Action workflow run logs.
>
> This is a search to find flaky issues in Pulsar GH issues:
>
> https://github.com/apache/pulsar/issues?q=flaky+sort%3Aupdated-desc+is%3Aopen
>
> Looking forward to the contributions for fixing flaky tests,
>
> BR,
>
> Lari
>