You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/02 19:12:35 UTC

[GitHub] [pulsar] potiuk commented on pull request #9427: Add solution for quarantining flaky tests

potiuk commented on pull request #9427:
URL: https://github.com/apache/pulsar/pull/9427#issuecomment-771902672


   > > probably we could have a separate CI job, not bound to PRs that runs quarantined tests
   > 
   > Good points @eolivelli. yes that would be required. I believe Airflow CI has such a job.
   
   Yes we have in airflow a separate job. Failure of that job does not stop anything  (it has `continue-on-error: true` set: https://github.com/apache/airflow/blob/6fd93badaa86d5fd53a8dd9858467ab7e85208a6/.github/workflows/ci.yml#L738 ). But if those Quarantined tests fail, the whole job gets "red" status, however you can clearly see that those are the quarantined tests that failed, not the "stable" ones. And when you see they are not related to a change you can still merge it.
   
   Since those tests are flaky but not "broken", they do succeed more often than they don't so we notice and can react if we see that those tests in "Quarantine" start to fail consistently. This is easy to spot actually if you are active committer who merges a number of commits a day/week. 
   
   We also have a separate workflow which is "scheduled" and only runs quarantined tests, but this did not work as well as I wanted. We were submitting status of such quarantined tests to https://github.com/apache/airflow/issues/10118 automatically (last 20 runs were kept there). This however turned out to be flaky on its own. 
   Some of the flaky tests simply hang and then they make this exercise a bit pointless. Also the flaky tests tend to be less flaky when run in isolation so there are tests that always succeed when run separately. We call them Heisentests (akin to https://en.wikipedia.org/wiki/Heisenbug) and we actually introduced another "marker" for those (@heisentests) and they are always run in isolation.
   
   Also yeah - we have a bug for every quarantined test:  https://github.com/apache/airflow/issues?q=is%3Aopen+is%3Aissue+label%3AQuarantine but - due to 2.0.0 release (2 years in the making) and follow up 2.0.1 we are working on on where we fix teething problems of 2.0.0 those are a little neglected - they are all part of the "2.0 Cleanup milestone" and I actually hope to get them fixed as soon as we release 2.0.1
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org