You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <Ja...@polidea.com> on 2020/04/20 00:30:58 UTC

CI tests idea - Quarantined tests idea

Hello everyone,

I have a proposal - very much COVID-19-inspired on how to fix our CI tests...

After the recent problems with CI together with Daniel and Tomek we
decided to make an emergency migration to Github Actions. So we did.

I think overall it was a good move, but we had some problems with it.
It turns out that while we were blaming Travis for everything wrong
that happened in our builds, it was not always Travis' fault. We have
some tests that are also failing in Github Actions and I think it's
the highest time we fix them.

I spend a better part of the weekend bring trying different things and
implementing numerous optimizations back to our CI configuration (a
lot of those were lost during the emergency move).

While running it I had many issues and I think I found a good way to
handle our flaky tests. I would love that others think about it.

Those interested - please take a look at the PR "Bring back CI
optimisations" https://github.com/apache/airflow/pull/8393
Corresponding GituhbActions here:
https://github.com/apache/airflow/actions/runs/82410109

I implemented a lot of optimizations in this PR (some of them will
only take effect after we merge to master) but most of all I wanted to
introduce a concept of "quarantined tests" (good name isn't it :) )

Here is the idea:

 - tests that are marked as @pytest.mark.quarantined are skipped in
regular runs (I identified 58 potential candidates - not all of them
are flaky but I wanted to be safe)
 - there is one dedicated "Quarantine" job that runs only quarantined
tests (it's Postgres 9.6 with Python 3.6 for now)
 - those "quarantined" tests are run with 90 s. timeout each and rerun
up to 3 times if they fail
 - failure of any of the Quarantine tests does not fail the whole CI
 - I plan to create GithUb issues for groups of those tests
(MoveOutOfQuarantine NNNN)
 - I think it's best if we split them between committers
- The job of the committers will be to observe the stability of those tests
- once we fix and observe that the tests are "stable" we  move them
out of Quarantine back to regular tests (by removing
@pytest.mark.quarantined)
- the goal is to move all our tests out of Quarantine
- in the future we can move any flaky test to Quarantine (by adding
@pytest.mark.quarantined) and it will give us time to observe it and
fix any flakiness.

Let me know what you think of it?

J.

-- 
Jarek Potiuk
Polidea | Principal Software Engineer

M: +48 660 796 129

Re: CI tests idea - Quarantined tests idea

Posted by Jarek Potiuk <Ja...@polidea.com>.
OK. I have a green build finally on the CI optimisations PR with
Quarantined tests (and they all passed this time). I expect we might
discover few more tests that might get quarantined in the next few days - I
will keep an eye on that and will organize "Test Cleaning" project on
GitHub and involve others.

PR here: https://github.com/apache/airflow/pull/8393
Build here: https://github.com/apache/airflow/actions/runs/82894632

I got already an approval from Tomek, but if you want to take a look and
comment. I am still testing caching for images on my own fork, but once I
get it confirmed, I'd love to merge it.

Some optimisations that I brought back/introduced:

   - tests are not executed for doc-only changes
   - images will be (once merged) downloaded from GitHub Registry so likely
   much faster
   - we have a "scheduled" nightly build that will build everything from
   scratch and check if no requirements have been broken
   - updated documentation and removed Travis references
   - coloured output where needed
   - much nicer static check output now (we have timestamp in GA so we
   could disable verbosity
   - improved split of static checks between two static check jobs - to
   utilise parallelism better.
   - reorganised some fast jobs (requirements, prod image) that do not
   depend on tests so that they can run earlier
   - shorter names for jobs so that they are nicer to view in the actions
   view
   - matrix definitions of the jobs so that we can manage them better

What is left is to bring Kubernetes jobs to Github Actions. Working on it
next.

J.




On Mon, Apr 20, 2020 at 12:11 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Absolutely 1 Great idea! Happy to coordinate - and I hope others would
> like to join it as well :)
>
> On Mon, Apr 20, 2020 at 12:04 PM Tomasz Urbaszek <
> tomasz.urbaszek@polidea.com> wrote:
>
>> Got it!
>>
>> What would you say to organize a more coordinated effort to improve
>> our testing suite something like "Fridays with tests"? In a few weeks,
>> this should result in a much better test suite and probably fewer
>> problems with CI. This also a nice way to take a look at Airflow
>> internals :)
>>
>> Tomek
>>
>>
>> On Mon, Apr 20, 2020 at 10:18 AM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>> >
>> > Both - depending on the tests. I think for now I've been over-cautious a
>> > bit and after merging while observing a few runs in production (and
>> other
>> > people's PR we might quickly go down with the number of quarantined
>> tests.
>> >
>> > I think most of the problematic tests are really "long-running" and
>> pretty
>> > stand-alone ones. I think part of the process should be that if we find
>> > that they require some side effects, we will be able to fix that the and
>> > eventually we will only have few quarantined "single tests" rather than
>> > "whole classes"
>> >
>> > On Mon, Apr 20, 2020 at 7:42 AM Tomasz Urbaszek <
>> tomasz.urbaszek@polidea.com>
>> > wrote:
>> >
>> > > Thank you Jarek for your work!
>> > > +1 for the idea of quarantine tests. Just one question: are we marking
>> > > single tests or whole classes? This question is mostly related to
>> > > tests that requires some side effects from previous tests.
>> > >
>> > > Tomek
>> > >
>> > >
>> > > On Mon, Apr 20, 2020 at 2:38 AM Jarek Potiuk <
>> Jarek.Potiuk@polidea.com>
>> > > wrote:
>> > > >
>> > > > Hello everyone,
>> > > >
>> > > > I have a proposal - very much COVID-19-inspired on how to fix our CI
>> > > tests...
>> > > >
>> > > > After the recent problems with CI together with Daniel and Tomek we
>> > > > decided to make an emergency migration to Github Actions. So we did.
>> > > >
>> > > > I think overall it was a good move, but we had some problems with
>> it.
>> > > > It turns out that while we were blaming Travis for everything wrong
>> > > > that happened in our builds, it was not always Travis' fault. We
>> have
>> > > > some tests that are also failing in Github Actions and I think it's
>> > > > the highest time we fix them.
>> > > >
>> > > > I spend a better part of the weekend bring trying different things
>> and
>> > > > implementing numerous optimizations back to our CI configuration (a
>> > > > lot of those were lost during the emergency move).
>> > > >
>> > > > While running it I had many issues and I think I found a good way to
>> > > > handle our flaky tests. I would love that others think about it.
>> > > >
>> > > > Those interested - please take a look at the PR "Bring back CI
>> > > > optimisations" https://github.com/apache/airflow/pull/8393
>> > > > Corresponding GituhbActions here:
>> > > > https://github.com/apache/airflow/actions/runs/82410109
>> > > >
>> > > > I implemented a lot of optimizations in this PR (some of them will
>> > > > only take effect after we merge to master) but most of all I wanted
>> to
>> > > > introduce a concept of "quarantined tests" (good name isn't it :) )
>> > > >
>> > > > Here is the idea:
>> > > >
>> > > >  - tests that are marked as @pytest.mark.quarantined are skipped in
>> > > > regular runs (I identified 58 potential candidates - not all of them
>> > > > are flaky but I wanted to be safe)
>> > > >  - there is one dedicated "Quarantine" job that runs only
>> quarantined
>> > > > tests (it's Postgres 9.6 with Python 3.6 for now)
>> > > >  - those "quarantined" tests are run with 90 s. timeout each and
>> rerun
>> > > > up to 3 times if they fail
>> > > >  - failure of any of the Quarantine tests does not fail the whole CI
>> > > >  - I plan to create GithUb issues for groups of those tests
>> > > > (MoveOutOfQuarantine NNNN)
>> > > >  - I think it's best if we split them between committers
>> > > > - The job of the committers will be to observe the stability of
>> those
>> > > tests
>> > > > - once we fix and observe that the tests are "stable" we  move them
>> > > > out of Quarantine back to regular tests (by removing
>> > > > @pytest.mark.quarantined)
>> > > > - the goal is to move all our tests out of Quarantine
>> > > > - in the future we can move any flaky test to Quarantine (by adding
>> > > > @pytest.mark.quarantined) and it will give us time to observe it and
>> > > > fix any flakiness.
>> > > >
>> > > > Let me know what you think of it?
>> > > >
>> > > > J.
>> > > >
>> > > > --
>> > > > Jarek Potiuk
>> > > > Polidea | Principal Software Engineer
>> > > >
>> > > > M: +48 660 796 129
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Tomasz Urbaszek
>> > > Polidea | Software Engineer
>> > >
>> > > M: +48 505 628 493
>> > > E: tomasz.urbaszek@polidea.com
>> > >
>> > > Unique Tech
>> > > Check out our projects!
>> > >
>> >
>> >
>> > --
>> >
>> > Jarek Potiuk
>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >
>> > M: +48 660 796 129 <+48660796129>
>> > [image: Polidea] <https://www.polidea.com/>
>>
>>
>>
>> --
>>
>> Tomasz Urbaszek
>> Polidea | Software Engineer
>>
>> M: +48 505 628 493
>> E: tomasz.urbaszek@polidea.com
>>
>> Unique Tech
>> Check out our projects!
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: CI tests idea - Quarantined tests idea

Posted by Jarek Potiuk <Ja...@polidea.com>.
Absolutely 1 Great idea! Happy to coordinate - and I hope others would like
to join it as well :)

On Mon, Apr 20, 2020 at 12:04 PM Tomasz Urbaszek <
tomasz.urbaszek@polidea.com> wrote:

> Got it!
>
> What would you say to organize a more coordinated effort to improve
> our testing suite something like "Fridays with tests"? In a few weeks,
> this should result in a much better test suite and probably fewer
> problems with CI. This also a nice way to take a look at Airflow
> internals :)
>
> Tomek
>
>
> On Mon, Apr 20, 2020 at 10:18 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
> >
> > Both - depending on the tests. I think for now I've been over-cautious a
> > bit and after merging while observing a few runs in production (and other
> > people's PR we might quickly go down with the number of quarantined
> tests.
> >
> > I think most of the problematic tests are really "long-running" and
> pretty
> > stand-alone ones. I think part of the process should be that if we find
> > that they require some side effects, we will be able to fix that the and
> > eventually we will only have few quarantined "single tests" rather than
> > "whole classes"
> >
> > On Mon, Apr 20, 2020 at 7:42 AM Tomasz Urbaszek <
> tomasz.urbaszek@polidea.com>
> > wrote:
> >
> > > Thank you Jarek for your work!
> > > +1 for the idea of quarantine tests. Just one question: are we marking
> > > single tests or whole classes? This question is mostly related to
> > > tests that requires some side effects from previous tests.
> > >
> > > Tomek
> > >
> > >
> > > On Mon, Apr 20, 2020 at 2:38 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > > wrote:
> > > >
> > > > Hello everyone,
> > > >
> > > > I have a proposal - very much COVID-19-inspired on how to fix our CI
> > > tests...
> > > >
> > > > After the recent problems with CI together with Daniel and Tomek we
> > > > decided to make an emergency migration to Github Actions. So we did.
> > > >
> > > > I think overall it was a good move, but we had some problems with it.
> > > > It turns out that while we were blaming Travis for everything wrong
> > > > that happened in our builds, it was not always Travis' fault. We have
> > > > some tests that are also failing in Github Actions and I think it's
> > > > the highest time we fix them.
> > > >
> > > > I spend a better part of the weekend bring trying different things
> and
> > > > implementing numerous optimizations back to our CI configuration (a
> > > > lot of those were lost during the emergency move).
> > > >
> > > > While running it I had many issues and I think I found a good way to
> > > > handle our flaky tests. I would love that others think about it.
> > > >
> > > > Those interested - please take a look at the PR "Bring back CI
> > > > optimisations" https://github.com/apache/airflow/pull/8393
> > > > Corresponding GituhbActions here:
> > > > https://github.com/apache/airflow/actions/runs/82410109
> > > >
> > > > I implemented a lot of optimizations in this PR (some of them will
> > > > only take effect after we merge to master) but most of all I wanted
> to
> > > > introduce a concept of "quarantined tests" (good name isn't it :) )
> > > >
> > > > Here is the idea:
> > > >
> > > >  - tests that are marked as @pytest.mark.quarantined are skipped in
> > > > regular runs (I identified 58 potential candidates - not all of them
> > > > are flaky but I wanted to be safe)
> > > >  - there is one dedicated "Quarantine" job that runs only quarantined
> > > > tests (it's Postgres 9.6 with Python 3.6 for now)
> > > >  - those "quarantined" tests are run with 90 s. timeout each and
> rerun
> > > > up to 3 times if they fail
> > > >  - failure of any of the Quarantine tests does not fail the whole CI
> > > >  - I plan to create GithUb issues for groups of those tests
> > > > (MoveOutOfQuarantine NNNN)
> > > >  - I think it's best if we split them between committers
> > > > - The job of the committers will be to observe the stability of those
> > > tests
> > > > - once we fix and observe that the tests are "stable" we  move them
> > > > out of Quarantine back to regular tests (by removing
> > > > @pytest.mark.quarantined)
> > > > - the goal is to move all our tests out of Quarantine
> > > > - in the future we can move any flaky test to Quarantine (by adding
> > > > @pytest.mark.quarantined) and it will give us time to observe it and
> > > > fix any flakiness.
> > > >
> > > > Let me know what you think of it?
> > > >
> > > > J.
> > > >
> > > > --
> > > > Jarek Potiuk
> > > > Polidea | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129
> > >
> > >
> > >
> > > --
> > >
> > > Tomasz Urbaszek
> > > Polidea | Software Engineer
> > >
> > > M: +48 505 628 493
> > > E: tomasz.urbaszek@polidea.com
> > >
> > > Unique Tech
> > > Check out our projects!
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
>
>
>
> --
>
> Tomasz Urbaszek
> Polidea | Software Engineer
>
> M: +48 505 628 493
> E: tomasz.urbaszek@polidea.com
>
> Unique Tech
> Check out our projects!
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: CI tests idea - Quarantined tests idea

Posted by Tomasz Urbaszek <to...@polidea.com>.
Got it!

What would you say to organize a more coordinated effort to improve
our testing suite something like "Fridays with tests"? In a few weeks,
this should result in a much better test suite and probably fewer
problems with CI. This also a nice way to take a look at Airflow
internals :)

Tomek


On Mon, Apr 20, 2020 at 10:18 AM Jarek Potiuk <Ja...@polidea.com> wrote:
>
> Both - depending on the tests. I think for now I've been over-cautious a
> bit and after merging while observing a few runs in production (and other
> people's PR we might quickly go down with the number of quarantined tests.
>
> I think most of the problematic tests are really "long-running" and pretty
> stand-alone ones. I think part of the process should be that if we find
> that they require some side effects, we will be able to fix that the and
> eventually we will only have few quarantined "single tests" rather than
> "whole classes"
>
> On Mon, Apr 20, 2020 at 7:42 AM Tomasz Urbaszek <to...@polidea.com>
> wrote:
>
> > Thank you Jarek for your work!
> > +1 for the idea of quarantine tests. Just one question: are we marking
> > single tests or whole classes? This question is mostly related to
> > tests that requires some side effects from previous tests.
> >
> > Tomek
> >
> >
> > On Mon, Apr 20, 2020 at 2:38 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> > >
> > > Hello everyone,
> > >
> > > I have a proposal - very much COVID-19-inspired on how to fix our CI
> > tests...
> > >
> > > After the recent problems with CI together with Daniel and Tomek we
> > > decided to make an emergency migration to Github Actions. So we did.
> > >
> > > I think overall it was a good move, but we had some problems with it.
> > > It turns out that while we were blaming Travis for everything wrong
> > > that happened in our builds, it was not always Travis' fault. We have
> > > some tests that are also failing in Github Actions and I think it's
> > > the highest time we fix them.
> > >
> > > I spend a better part of the weekend bring trying different things and
> > > implementing numerous optimizations back to our CI configuration (a
> > > lot of those were lost during the emergency move).
> > >
> > > While running it I had many issues and I think I found a good way to
> > > handle our flaky tests. I would love that others think about it.
> > >
> > > Those interested - please take a look at the PR "Bring back CI
> > > optimisations" https://github.com/apache/airflow/pull/8393
> > > Corresponding GituhbActions here:
> > > https://github.com/apache/airflow/actions/runs/82410109
> > >
> > > I implemented a lot of optimizations in this PR (some of them will
> > > only take effect after we merge to master) but most of all I wanted to
> > > introduce a concept of "quarantined tests" (good name isn't it :) )
> > >
> > > Here is the idea:
> > >
> > >  - tests that are marked as @pytest.mark.quarantined are skipped in
> > > regular runs (I identified 58 potential candidates - not all of them
> > > are flaky but I wanted to be safe)
> > >  - there is one dedicated "Quarantine" job that runs only quarantined
> > > tests (it's Postgres 9.6 with Python 3.6 for now)
> > >  - those "quarantined" tests are run with 90 s. timeout each and rerun
> > > up to 3 times if they fail
> > >  - failure of any of the Quarantine tests does not fail the whole CI
> > >  - I plan to create GithUb issues for groups of those tests
> > > (MoveOutOfQuarantine NNNN)
> > >  - I think it's best if we split them between committers
> > > - The job of the committers will be to observe the stability of those
> > tests
> > > - once we fix and observe that the tests are "stable" we  move them
> > > out of Quarantine back to regular tests (by removing
> > > @pytest.mark.quarantined)
> > > - the goal is to move all our tests out of Quarantine
> > > - in the future we can move any flaky test to Quarantine (by adding
> > > @pytest.mark.quarantined) and it will give us time to observe it and
> > > fix any flakiness.
> > >
> > > Let me know what you think of it?
> > >
> > > J.
> > >
> > > --
> > > Jarek Potiuk
> > > Polidea | Principal Software Engineer
> > >
> > > M: +48 660 796 129
> >
> >
> >
> > --
> >
> > Tomasz Urbaszek
> > Polidea | Software Engineer
> >
> > M: +48 505 628 493
> > E: tomasz.urbaszek@polidea.com
> >
> > Unique Tech
> > Check out our projects!
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>



-- 

Tomasz Urbaszek
Polidea | Software Engineer

M: +48 505 628 493
E: tomasz.urbaszek@polidea.com

Unique Tech
Check out our projects!

Re: CI tests idea - Quarantined tests idea

Posted by Jarek Potiuk <Ja...@polidea.com>.
Both - depending on the tests. I think for now I've been over-cautious a
bit and after merging while observing a few runs in production (and other
people's PR we might quickly go down with the number of quarantined tests.

I think most of the problematic tests are really "long-running" and pretty
stand-alone ones. I think part of the process should be that if we find
that they require some side effects, we will be able to fix that the and
eventually we will only have few quarantined "single tests" rather than
"whole classes"

On Mon, Apr 20, 2020 at 7:42 AM Tomasz Urbaszek <to...@polidea.com>
wrote:

> Thank you Jarek for your work!
> +1 for the idea of quarantine tests. Just one question: are we marking
> single tests or whole classes? This question is mostly related to
> tests that requires some side effects from previous tests.
>
> Tomek
>
>
> On Mon, Apr 20, 2020 at 2:38 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
> >
> > Hello everyone,
> >
> > I have a proposal - very much COVID-19-inspired on how to fix our CI
> tests...
> >
> > After the recent problems with CI together with Daniel and Tomek we
> > decided to make an emergency migration to Github Actions. So we did.
> >
> > I think overall it was a good move, but we had some problems with it.
> > It turns out that while we were blaming Travis for everything wrong
> > that happened in our builds, it was not always Travis' fault. We have
> > some tests that are also failing in Github Actions and I think it's
> > the highest time we fix them.
> >
> > I spend a better part of the weekend bring trying different things and
> > implementing numerous optimizations back to our CI configuration (a
> > lot of those were lost during the emergency move).
> >
> > While running it I had many issues and I think I found a good way to
> > handle our flaky tests. I would love that others think about it.
> >
> > Those interested - please take a look at the PR "Bring back CI
> > optimisations" https://github.com/apache/airflow/pull/8393
> > Corresponding GituhbActions here:
> > https://github.com/apache/airflow/actions/runs/82410109
> >
> > I implemented a lot of optimizations in this PR (some of them will
> > only take effect after we merge to master) but most of all I wanted to
> > introduce a concept of "quarantined tests" (good name isn't it :) )
> >
> > Here is the idea:
> >
> >  - tests that are marked as @pytest.mark.quarantined are skipped in
> > regular runs (I identified 58 potential candidates - not all of them
> > are flaky but I wanted to be safe)
> >  - there is one dedicated "Quarantine" job that runs only quarantined
> > tests (it's Postgres 9.6 with Python 3.6 for now)
> >  - those "quarantined" tests are run with 90 s. timeout each and rerun
> > up to 3 times if they fail
> >  - failure of any of the Quarantine tests does not fail the whole CI
> >  - I plan to create GithUb issues for groups of those tests
> > (MoveOutOfQuarantine NNNN)
> >  - I think it's best if we split them between committers
> > - The job of the committers will be to observe the stability of those
> tests
> > - once we fix and observe that the tests are "stable" we  move them
> > out of Quarantine back to regular tests (by removing
> > @pytest.mark.quarantined)
> > - the goal is to move all our tests out of Quarantine
> > - in the future we can move any flaky test to Quarantine (by adding
> > @pytest.mark.quarantined) and it will give us time to observe it and
> > fix any flakiness.
> >
> > Let me know what you think of it?
> >
> > J.
> >
> > --
> > Jarek Potiuk
> > Polidea | Principal Software Engineer
> >
> > M: +48 660 796 129
>
>
>
> --
>
> Tomasz Urbaszek
> Polidea | Software Engineer
>
> M: +48 505 628 493
> E: tomasz.urbaszek@polidea.com
>
> Unique Tech
> Check out our projects!
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: CI tests idea - Quarantined tests idea

Posted by Tomasz Urbaszek <to...@polidea.com>.
Thank you Jarek for your work!
+1 for the idea of quarantine tests. Just one question: are we marking
single tests or whole classes? This question is mostly related to
tests that requires some side effects from previous tests.

Tomek


On Mon, Apr 20, 2020 at 2:38 AM Jarek Potiuk <Ja...@polidea.com> wrote:
>
> Hello everyone,
>
> I have a proposal - very much COVID-19-inspired on how to fix our CI tests...
>
> After the recent problems with CI together with Daniel and Tomek we
> decided to make an emergency migration to Github Actions. So we did.
>
> I think overall it was a good move, but we had some problems with it.
> It turns out that while we were blaming Travis for everything wrong
> that happened in our builds, it was not always Travis' fault. We have
> some tests that are also failing in Github Actions and I think it's
> the highest time we fix them.
>
> I spend a better part of the weekend bring trying different things and
> implementing numerous optimizations back to our CI configuration (a
> lot of those were lost during the emergency move).
>
> While running it I had many issues and I think I found a good way to
> handle our flaky tests. I would love that others think about it.
>
> Those interested - please take a look at the PR "Bring back CI
> optimisations" https://github.com/apache/airflow/pull/8393
> Corresponding GituhbActions here:
> https://github.com/apache/airflow/actions/runs/82410109
>
> I implemented a lot of optimizations in this PR (some of them will
> only take effect after we merge to master) but most of all I wanted to
> introduce a concept of "quarantined tests" (good name isn't it :) )
>
> Here is the idea:
>
>  - tests that are marked as @pytest.mark.quarantined are skipped in
> regular runs (I identified 58 potential candidates - not all of them
> are flaky but I wanted to be safe)
>  - there is one dedicated "Quarantine" job that runs only quarantined
> tests (it's Postgres 9.6 with Python 3.6 for now)
>  - those "quarantined" tests are run with 90 s. timeout each and rerun
> up to 3 times if they fail
>  - failure of any of the Quarantine tests does not fail the whole CI
>  - I plan to create GithUb issues for groups of those tests
> (MoveOutOfQuarantine NNNN)
>  - I think it's best if we split them between committers
> - The job of the committers will be to observe the stability of those tests
> - once we fix and observe that the tests are "stable" we  move them
> out of Quarantine back to regular tests (by removing
> @pytest.mark.quarantined)
> - the goal is to move all our tests out of Quarantine
> - in the future we can move any flaky test to Quarantine (by adding
> @pytest.mark.quarantined) and it will give us time to observe it and
> fix any flakiness.
>
> Let me know what you think of it?
>
> J.
>
> --
> Jarek Potiuk
> Polidea | Principal Software Engineer
>
> M: +48 660 796 129



-- 

Tomasz Urbaszek
Polidea | Software Engineer

M: +48 505 628 493
E: tomasz.urbaszek@polidea.com

Unique Tech
Check out our projects!