You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/02 15:41:07 UTC

[GitHub] [pulsar] lhotari opened a new pull request #9427: Add solution for quarantining flaky tests

lhotari opened a new pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427


   ### Motivation
   
   Pulsar contains a large amount of flaky tests. It will be useful to have a solution for quarantining tests.
   
   This solution takes inspiration from [Airflow's Quarantined tests solution](https://github.com/apache/airflow/blob/master/TESTING.rst#quarantined-tests). Quarantining (or categorizing) flaky tests has been suggested by @potiuk in [the draft PIP "Changes to GitHub Actions based Pulsar CI"](https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing) and also by @dlg99 in [in the draft PIP "Changes to flaky test handling"](https://docs.google.com/document/d/10lmn4pW1IsT_8D1ZE0vMjASX0HhjdGdjB794iyScwns/edit?usp=sharing). Thank you @potiuk and @dlg99 for suggesting this.
   
   ### Modifications
   
   - Adds `@Quarantined` annotation for quarantining flaky tests
   - Implements TestNG `IAnnotationTransformer` for integrating with TestNG in `QuarantinedTestNGAnnotationTransformer`.
     - this class is configured as the TestNG listener in the maven build
     
   - Quarantined tests are not run by default.
   - Quarantined tests are executed when `runQuarantinedTests` system property is set to `true`
     - this is similar feature as Airflow's `--include-quarantined` flag
   - No other tests than Quarantined tests are executed when `runOnlyQuarantinedTests`
     system property is set to `true`.
     - this is similar feature as Airflow CI's `-m quarantined` flag
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9427: Add solution for quarantining flaky tests

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427#issuecomment-771734639


   > probably we could have a separate CI job, not bound to PRs that runs quarantined tests
   
   Good points @eolivelli. yes that would be required. I believe Airflow CI has such a job.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9427: Add solution for quarantining flaky tests

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427#issuecomment-771734639


   > probably we could have a separate CI job, not bound to PRs that runs quarantined tests
   
   Good points @eolivelli. yes that would be required. I believe Airflow CI has such a job.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] potiuk commented on pull request #9427: Add solution for quarantining flaky tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427#issuecomment-771905170


   BTW. Glad that you liked the idea :). And the name for those tests is very, very appropriate taking into account the external circumstances ;)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #9427: Add solution for quarantining flaky tests

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427#issuecomment-777252004


   @lhotari I see. thanks for the explanation


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #9427: Add solution for quarantining flaky tests

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427#issuecomment-777244397


   @lhotari can you please merge with current master and Kickstart CI again?
   It would be useful to merge this patch soon and move forward with CI cleanup


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9427: Add solution for quarantining flaky tests

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427#issuecomment-777250633


   > @lhotari can you please merge with current master and Kickstart CI again?
   > It would be useful to merge this patch soon and move forward with CI cleanup
   
   I'm postponing this PR until I have time to continue with this. It doesn't make sense to merge this in it's current form. Matteo provided some detailed requirements for the quarantined tests solution in the last Pulsar Community meeting. One of the ideas was to have a solution where the retries could be kept for quarantined tests. The rational behind this is that even though some of the tests are flaky, they provide some value to the reviewer. Removing the tests completely would leave a gap and if the tests are flaky, they won't be useful without retries. 
   
   Having this kind of hybrid solution in place would require that the quarantined test results are shown on the PR to the reviewer in a certain format. GitHub Actions has this feature called "annotations" which means that you can add custom test results that are displayed in the results for the test runs all the way on the PR. It would be necessary to revisit the solution and take this into account before merging the changes in this PR.
   
   I'm making progress on the CI clean, I'll hopefully have some public updates later today or tomorrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] potiuk commented on pull request #9427: Add solution for quarantining flaky tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427#issuecomment-771902672


   > > probably we could have a separate CI job, not bound to PRs that runs quarantined tests
   > 
   > Good points @eolivelli. yes that would be required. I believe Airflow CI has such a job.
   
   Yes we have in airflow a separate job. Failure of that job does not stop anything  (it has `continue-on-error: true` set: https://github.com/apache/airflow/blob/6fd93badaa86d5fd53a8dd9858467ab7e85208a6/.github/workflows/ci.yml#L738 ). But if those Quarantined tests fail, the whole job gets "red" status, however you can clearly see that those are the quarantined tests that failed, not the "stable" ones. And when you see they are not related to a change you can still merge it.
   
   Since those tests are flaky but not "broken", they do succeed more often than they don't so we notice and can react if we see that those tests in "Quarantine" start to fail consistently. This is easy to spot actually if you are active committer who merges a number of commits a day/week. 
   
   We also have a separate workflow which is "scheduled" and only runs quarantined tests, but this did not work as well as I wanted. We were submitting status of such quarantined tests to https://github.com/apache/airflow/issues/10118 automatically (last 20 runs were kept there). This however turned out to be flaky on its own. 
   Some of the flaky tests simply hang and then they make this exercise a bit pointless. Also the flaky tests tend to be less flaky when run in isolation so there are tests that always succeed when run separately. We call them Heisentests (akin to https://en.wikipedia.org/wiki/Heisenbug) and we actually introduced another "marker" for those (@heisentests) and they are always run in isolation.
   
   Also yeah - we have a bug for every quarantined test:  https://github.com/apache/airflow/issues?q=is%3Aopen+is%3Aissue+label%3AQuarantine but - due to 2.0.0 release (2 years in the making) and follow up 2.0.1 we are working on on where we fix teething problems of 2.0.0 those are a little neglected - they are all part of the "2.0 Cleanup milestone" and I actually hope to get them fixed as soon as we release 2.0.1
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari closed pull request #9427: Add solution for quarantining flaky tests

Posted by GitBox <gi...@apache.org>.
lhotari closed pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] potiuk commented on pull request #9427: Add solution for quarantining flaky tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427#issuecomment-771902672






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org