You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airflow.apache.org by Mateusz Nojek <ma...@gmail.com> on 2022/02/01 09:34:09 UTC

Re: [DISCUSSION] AIP-47 New design of Airflow System Tests

Hello everyone,

Thanks for the questions, Niko! I will try to address them now...

   1. The proposed solution is to have these provider-specific projects for
   running DAGs being set up for system tests execution and use GitHub CI just
   as a trigger for them. This way we will unload the GitHub CI workers and
   stream execution outside of GitHub. AFAIK Google is going to transfer some
   credits to GCP project dedicated for system tests in community. If other
   companies go the same way, we will be able to execute all system tests
   regularly and thus improve quality of Airflow.
   2. SYSTEM_TESTS_ENV_ID is a variable that needs to be set up by the user
   running system tests. It can be a name of the user, some random value or
   whatever. It can be used by scripts (e.g. in the CI process) that will
   create multiple environments for testing specific operators on different
   versions of Airflow, and thanks to this variable we can avoid potential
   collisions of data structures.
   The design purposely avoids setting it automatically because different
   users can have different needs of using it. If it's not the case, then user
   can just put anything there (e.g. user name) and run the tests.
   The instruction will be documented inside the README file for system
   tests.
   3. The main idea of this AIP is that the system tests are self-contained
   and independent of any provider-specific configuration. Each provider may
   require different configuration for the system tests to work and we don't
   want to enforce any general setup that will need to be followed by others.
   Tests need to remain simple so that one can write them without deep diving
   into the complex configuration (like it was before when some specific
   credential files needed to be injected before pytest test). In general,
   tests are independent of the configuration but it's up to the provider to
   prepare configuration instruction for their system tests to work
   (preferably in some README file). It's up to the provider how simple (or
   complex) configuration will be.
   4. The simplicity of system tests is based on the fact that they are
   just DAGs and they look like (old) example dags. Thanks to that we don't
   need to separatly maintain these DAGs (that serve also as part of the
   automatically generated documentation) and system tests, but we have it all
   as a one. We are aware that nobody's perfect and people do mistakes but at
   the same time we believe that users have the best knowledge how they want
   to use system tests. That's why it's important to have short but very
   specific documentation with instructions how to run these tests.
   Regarding the parts that sound more like best practices instead of
   strict rules... Enforcing users to use one-and-only way to configure tests
   will be very limiting. It is very hard to predict all possible users'
   behaviors, thus we cannot predict e.g. which task will be teardown and
   which not - it's definitely up to the test designer. The good news is that
   we can support users in a way that we will create pre-commit hooks that
   will check if the most significant parts are included in the test file
   (e.g. pytest wrapper or DebugExecutor declaration). Also we can prepare a
   script that will generate dummy system test file with all these parts and
   some comments so that user will just need to fill these parts according to
   his/her needs (something like invoke <https://www.pyinvoke.org/>script).

Thanks again for the questions and I hope it clarifies the things. If not,
we are happy to answer further comments and questions.

Kind regards,
Mateusz Nojek

śr., 26 sty 2022 o 00:05 Oliveira, Niko <on...@amazon.com.invalid>
napisał(a):

> Hey folks,
>
>
> Very excited about this AIP!
>
>
> I work on the AWS OSS Airflow team. Getting the AWS system tests running
> has been a pet-project of mine for the past little while. I come from a
> test automation background, so this is dear to my heart :)
>
>
> Currently I have a branch that contains the implementation of the various
> methods (mostly around environment and credential configuration) for AWS
> system tests, but I was running into very obscure runtime issues. So I'm
> glad to see one of the goals of this AIP is to simplify this process, since
> it is convoluted and clunky.
>
>
> Here are some high level thoughts after my read through the AIP:
>
>
> 1. “*CI integration can be built using GitHub CI or provider-related
> solution*“ - I'm happy (and excited) to collaborate on this front! I
> already have a proof of concept running (internally to AWS) built with AWS
> Code Pipeline and Code Build, which runs various tests each time my
> personal fork is updated from upstream. But we could easily make something
> like this public and then connect it to the Airflow repo to verify commits,
> PRs, etc.
>
>
> 2. After reading through the AIP, I still don't think I truly grok the
> SYSTEM_TESTS_ENV_ID environment variable. I understand its *use*,
> particularly for uniqueness, but who is the owner for setting this? What
> precisely does it represent? Including an example of an actual value for
> that env variable in the AIP would be helpful!
>
> 3. The AIP reads: “*Maintaining system tests requires knowledge about
> breeze, pytest and maintenance of special files like variables.env or
> credential files. Without those, we will simplify the architecture and
> improve management over tests.*”
>     The AIP talks a lot about removing this type of setup from the Airflow
> system test platform to simplify things and that "*All needed permissions
> to external services for execution of DAGs (tests) should be provided to
> the Airflow instance in advance.*" and that "*Each provider should create
> an instruction explaining how to prepare the environment to run related
> system tests so that users can do it on their own*".
>    In general the AIP reads as if it's solved this problem, but it's more
> like it has absolved itself from solving this problem, which is much
> different. I think this approach could possibly make things even worse as
> now there is no contract or interface for how to plumb configuration and
> credentials to the system test dags. The current set of methods and files
> to plumb credentials through aren't great (and as of now are quite Google
> specific) but I think this interface can be simplified and improved rather
> than just exported wholesale for each provider to re-invent a new approach.
>
>
> 4. Lastly, regarding the actual construction of the tests themselves, the
> proposed design is full of statements like:
>    - "*If a teardown task(s) has been defined, remember to add
> trigger_rule="all_done" parameter to the operator call. This will make sure
> that this task will always run even if the upstream fails*"
>    - “*Make sure to include these parameters into DAG call: ...*“
>
>    - “*Change Airflow Executor to DebugExecutor by placing this line at
> the top of the file: ...*”
>
>    - “*Try to keep tasks in the DAG body defined in an order of
> execution.*”
>
>
>    A system that relies on good intentions like "be sure to remember to do
> X otherwise bad thing Y will happen" certainly guaranties that bad thing Y
> will happen, frequently. Humans are fallible and not great at sticking to a
> contract or interface that isn't codified. And this AIP is littered with
> statements like this. We need test infrastructure that's easier to use and
> will also enforce these best practices/requirements which are needed for
> the tests to run.
> In general, it reads much more like a guideline on best practices rather
> than a new and improved system test engine.
>
>
>
> Thanks for taking the time to create this AIP I'm very eager to get system
> testing up and running in Airflow and I'd love to collaborate further on it!
>
> Cheers,
> Niko
>
>
> ------------------------------
> *From:* Jarek Potiuk <ja...@potiuk.com>
> *Sent:* Tuesday, January 25, 2022 8:57 AM
> *To:* dev@airflow.apache.org
> *Subject:* RE: [EXTERNAL] [DISCUSSION] AIP-47 New design of Airflow
> System Tests
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
> Just let me add a bit more context as I see it.
>
> The AIP-47 design (following the AIP-4 idea) is targeted to handle those
> things:
>
> * streamline the execution of the System Tests (which are basically
> executable "example_dags" of ours which we already have in our
> documentation for providers)
> * make them integrated and automatically execute in a "cloud" environment
> - with real cloud services behind
> * make use of the 100s of System Tests already written in Google Provider
> (and many others which we have  in in other providers - for example AWS
> should be close to follow)
> * test automation of those using Google Cloud Platform donated credits
> (from what I understand Google is willing to sponsor credits to run those
> tests regularly).
> * I am quite sure when it works also AWS will donate credits for our CI
> that we could use for that.
> * and we should be able to extend it to other providers as well once the
> "CI environment" is in place.
>
> When this one is implemented, we should be finally able not only to run
> "unit" tests with all PRs for a lot of provider code, but also we will be
> able to regularly (not with all PR but likely daily) run the example dags
> with the real services and make sure that at least big part of our
> Providers code (Hooks/Operators) are still "working as expected" - at least
> that they pass "smoke tests".
>
> Currently we rely mostly on the manual tests done when we release
> providers (and our community members are actually great here - as every
> month we have a great deal of help from within the community), but when we
> have system tests automation, the confidence of releasing new versions of
> providers will go up significantly.
>
> I am really looking forward to discussion on this one and implementation
> and deployment for a number of providers.
>
> j.
>
>
> On Tue, Jan 25, 2022 at 2:13 PM Mateusz Nojek <ma...@gmail.com> wrote:
>
>> Hello everyone,
>>
>> I created a new AIP draft titled AIP-47 New design of Airflow System
>> Tests
>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests>
>> .
>> It is related to the new design of system tests that improves how they
>> are written and run which should contribute to higher quality of tests and
>> thus to lower quantity of bugs related to the operators. The AIP-47 is
>> located under AIP-4
>> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+Automation+of+System+Tests+for+external+systems>
>> as its direct successor (it more deeply describes some of the concepts).
>> The basic target was to simplify tests execution and enable running them
>> more regularly.
>>
>> I was working on it with Eugene Kostieiev and Bartłomiej Hirsz for some
>> time and soon we are also going to publish a PR with changes based on the
>> content of this AIP.
>>
>> For now, we would like to start a discussion with the developers and
>> contributors of the community of Apache Airflow. The idea was already
>> discussed with Jarek Potiuk, the original author of AIP-4, who cooperated
>> with us on some of the design details.
>>
>> Please, tell me what you think about it and share your ideas, concerns
>> and comments.
>>
>> Kind regards,
>> Mateusz Nojek
>>
>

Re: [DISCUSSION] AIP-47 New design of Airflow System Tests

Posted by Giorgio Zoppi <gi...@gmail.com>.

Hello Mateuz,
it's ok for me. Good.
BR,
Giorgio

Il giorno mar 15 feb 2022 alle ore 10:51 Mateusz Nojek <ma...@gmail.com>
ha scritto:

> Hello again,
>
> I agree with your (Niko, Giorgio) points that the environment needs to be
> set up as easy as possible, but I will leave it to the providers, as
> mentioned before. It's also in their interest to simplify it.
>
> Regarding the "watcher" task. It's been discussed in another topic on the
> devlist (https://lists.apache.org/thread/qkbwb40tqd3gpzny5xvsvsf93r3slm2z)
> and the conclusion is that we can make use of "tasks" attribute of the dag
> object, like this:
>
> list(dag.tasks) >> watcher
>
> No new features needed and also by using list conversion we protect it
> from further changes to dag.tasks field.
> I think this is very clever yet beautiful way to create a dependency
> between watcher and all other tasks (thanks, Ash!).
>
> Also, I've just updated the AIP with changes proposed during this
> discussion:
> - Added point 3. in "What change do you propose to make"
> - Added point 7. in "Design details"
> - Updated "Complete test example" section
> - Added new section "Watcher task" right under "Tasks as setup and
> teardown"
>
> Please familiarize with the changes and if there are no other strong
> opinions against this AIP, I would like to start the voting. Is that OK?
>
> Kind regards,
> Mateusz Nojek
>
>
> czw., 10 lut 2022 o 11:27 Jarek Potiuk <ja...@potiuk.com> napisał(a):
>
>>
>>> I think both these sticking points are really a trade-off of simplicity
>>> vs consistency/reliability. And to be clear I'm not arguing for things to
>>> be more complex just for the heck of it, I agree that simplicity is great!
>>> But just that there needs to be a balance and we can't get caught
>>> over-indexing on one or the other. I think the combination of test
>>> environments being a free for all and tests being simply a set of
>>> guidelines with some static analysis both will combine to be brittle. The
>>> example Mateusz just described regarding around needing a watcher task to
>>> ensure tests end with the right result is a great example of how the route
>>> of kludging example dags themselves to be the test *and* the test
>>> runner can be brittle and complicated. And again, I love the idea of the
>>> example dags being the code under test, I just think having them also
>>> conduct the test execution of themselves is going to be troublesome.
>>>
>>
>> I think we should try it and see. I do agree the "watcher" case is a bit
>> not super-straightforward - and partially it comes from lack of features in
>> Airflow DAG processing. Maybe also that means we SHOULD add a new feature
>> where we can specify a task in a DAG that is always run when the DAG ends
>> and determines the status of all tasks ?
>>
>> Currently the idea we have (my proposal) is that all such "overhead" code
>> in the example dags MIGHT be automatically added by pre-commit. So the
>> "Example" dags might have an "auto-generated" part where such watcher (if
>> present in the DAG) will automatically get the right dependencies to all
>> the tasks. But maybe we actually could add such feature to Airflow. It does
>> not seem complex. We could actually have a simple new operator for DAG to
>> add such "completion" task ?
>>
>> I can imagine for example this:
>>
>> dag >>= watcher
>>
>> Which could automatically add all tasks defined currently in the DAG as
>> dependencies of the watcher :)
>>
>> WDYT? That sounds like an easy task and also usable in a number of cases
>> except the System Testing ?
>>
>> J.
>>
>>
>>
>

Re: [DISCUSSION] AIP-47 New design of Airflow System Tests

Posted by Jarek Potiuk <ja...@potiuk.com>.

For me that one looks good.

On Tue, Feb 15, 2022 at 10:51 AM Mateusz Nojek <ma...@gmail.com> wrote:
>
> Hello again,
>
> I agree with your (Niko, Giorgio) points that the environment needs to be set up as easy as possible, but I will leave it to the providers, as mentioned before. It's also in their interest to simplify it.
>
> Regarding the "watcher" task. It's been discussed in another topic on the devlist (https://lists.apache.org/thread/qkbwb40tqd3gpzny5xvsvsf93r3slm2z) and the conclusion is that we can make use of "tasks" attribute of the dag object, like this:
>
> list(dag.tasks) >> watcher
>
> No new features needed and also by using list conversion we protect it from further changes to dag.tasks field.
> I think this is very clever yet beautiful way to create a dependency between watcher and all other tasks (thanks, Ash!).
>
> Also, I've just updated the AIP with changes proposed during this discussion:
> - Added point 3. in "What change do you propose to make"
> - Added point 7. in "Design details"
> - Updated "Complete test example" section
> - Added new section "Watcher task" right under "Tasks as setup and teardown"
>
> Please familiarize with the changes and if there are no other strong opinions against this AIP, I would like to start the voting. Is that OK?
>
> Kind regards,
> Mateusz Nojek
>
>
> czw., 10 lut 2022 o 11:27 Jarek Potiuk <ja...@potiuk.com> napisał(a):
>>>
>>>
>>> I think both these sticking points are really a trade-off of simplicity vs consistency/reliability. And to be clear I'm not arguing for things to be more complex just for the heck of it, I agree that simplicity is great! But just that there needs to be a balance and we can't get caught over-indexing on one or the other. I think the combination of test environments being a free for all and tests being simply a set of guidelines with some static analysis both will combine to be brittle. The example Mateusz just described regarding around needing a watcher task to ensure tests end with the right result is a great example of how the route of kludging example dags themselves to be the test and the test runner can be brittle and complicated. And again, I love the idea of the example dags being the code under test, I just think having them also conduct the test execution of themselves is going to be troublesome.
>>
>>
>> I think we should try it and see. I do agree the "watcher" case is a bit not super-straightforward - and partially it comes from lack of features in Airflow DAG processing. Maybe also that means we SHOULD add a new feature where we can specify a task in a DAG that is always run when the DAG ends and determines the status of all tasks ?
>>
>> Currently the idea we have (my proposal) is that all such "overhead" code in the example dags MIGHT be automatically added by pre-commit. So the "Example" dags might have an "auto-generated" part where such watcher (if present in the DAG) will automatically get the right dependencies to all the tasks. But maybe we actually could add such feature to Airflow. It does not seem complex. We could actually have a simple new operator for DAG to add such "completion" task ?
>>
>> I can imagine for example this:
>>
>> dag >>= watcher
>>
>> Which could automatically add all tasks defined currently in the DAG as dependencies of the watcher :)
>>
>> WDYT? That sounds like an easy task and also usable in a number of cases except the System Testing ?
>>
>> J.
>>
>>

Re: [DISCUSSION] AIP-47 New design of Airflow System Tests

Posted by Mateusz Nojek <ma...@gmail.com>.

Hello again,

I agree with your (Niko, Giorgio) points that the environment needs to be
set up as easy as possible, but I will leave it to the providers, as
mentioned before. It's also in their interest to simplify it.

Regarding the "watcher" task. It's been discussed in another topic on the
devlist (https://lists.apache.org/thread/qkbwb40tqd3gpzny5xvsvsf93r3slm2z)
and the conclusion is that we can make use of "tasks" attribute of the dag
object, like this:

list(dag.tasks) >> watcher

No new features needed and also by using list conversion we protect it from
further changes to dag.tasks field.
I think this is very clever yet beautiful way to create a dependency
between watcher and all other tasks (thanks, Ash!).

Also, I've just updated the AIP with changes proposed during this
discussion:
- Added point 3. in "What change do you propose to make"
- Added point 7. in "Design details"
- Updated "Complete test example" section
- Added new section "Watcher task" right under "Tasks as setup and teardown"

Please familiarize with the changes and if there are no other strong
opinions against this AIP, I would like to start the voting. Is that OK?

Kind regards,
Mateusz Nojek


czw., 10 lut 2022 o 11:27 Jarek Potiuk <ja...@potiuk.com> napisał(a):

>
>> I think both these sticking points are really a trade-off of simplicity
>> vs consistency/reliability. And to be clear I'm not arguing for things to
>> be more complex just for the heck of it, I agree that simplicity is great!
>> But just that there needs to be a balance and we can't get caught
>> over-indexing on one or the other. I think the combination of test
>> environments being a free for all and tests being simply a set of
>> guidelines with some static analysis both will combine to be brittle. The
>> example Mateusz just described regarding around needing a watcher task to
>> ensure tests end with the right result is a great example of how the route
>> of kludging example dags themselves to be the test *and* the test runner
>> can be brittle and complicated. And again, I love the idea of the example
>> dags being the code under test, I just think having them also conduct the
>> test execution of themselves is going to be troublesome.
>>
>
> I think we should try it and see. I do agree the "watcher" case is a bit
> not super-straightforward - and partially it comes from lack of features in
> Airflow DAG processing. Maybe also that means we SHOULD add a new feature
> where we can specify a task in a DAG that is always run when the DAG ends
> and determines the status of all tasks ?
>
> Currently the idea we have (my proposal) is that all such "overhead" code
> in the example dags MIGHT be automatically added by pre-commit. So the
> "Example" dags might have an "auto-generated" part where such watcher (if
> present in the DAG) will automatically get the right dependencies to all
> the tasks. But maybe we actually could add such feature to Airflow. It does
> not seem complex. We could actually have a simple new operator for DAG to
> add such "completion" task ?
>
> I can imagine for example this:
>
> dag >>= watcher
>
> Which could automatically add all tasks defined currently in the DAG as
> dependencies of the watcher :)
>
> WDYT? That sounds like an easy task and also usable in a number of cases
> except the System Testing ?
>
> J.
>
>
>

Re: [DISCUSSION] AIP-47 New design of Airflow System Tests

Posted by Jarek Potiuk <ja...@potiuk.com>.

>
>
> I think both these sticking points are really a trade-off of simplicity vs
> consistency/reliability. And to be clear I'm not arguing for things to be
> more complex just for the heck of it, I agree that simplicity is great! But
> just that there needs to be a balance and we can't get caught over-indexing
> on one or the other. I think the combination of test environments being a
> free for all and tests being simply a set of guidelines with some static
> analysis both will combine to be brittle. The example Mateusz just
> described regarding around needing a watcher task to ensure tests end with
> the right result is a great example of how the route of kludging example
> dags themselves to be the test *and* the test runner can be brittle and
> complicated. And again, I love the idea of the example dags being the code
> under test, I just think having them also conduct the test execution of
> themselves is going to be troublesome.
>

I think we should try it and see. I do agree the "watcher" case is a bit
not super-straightforward - and partially it comes from lack of features in
Airflow DAG processing. Maybe also that means we SHOULD add a new feature
where we can specify a task in a DAG that is always run when the DAG ends
and determines the status of all tasks ?

Currently the idea we have (my proposal) is that all such "overhead" code
in the example dags MIGHT be automatically added by pre-commit. So the
"Example" dags might have an "auto-generated" part where such watcher (if
present in the DAG) will automatically get the right dependencies to all
the tasks. But maybe we actually could add such feature to Airflow. It does
not seem complex. We could actually have a simple new operator for DAG to
add such "completion" task ?

I can imagine for example this:

dag >>= watcher

Which could automatically add all tasks defined currently in the DAG as
dependencies of the watcher :)

WDYT? That sounds like an easy task and also usable in a number of cases
except the System Testing ?

J.

Re: [DISCUSSION] AIP-47 New design of Airflow System Tests

Posted by "Oliveira, Niko" <on...@amazon.com.INVALID>.

Hey folks,


I think both these sticking points are really a trade-off of simplicity vs consistency/reliability. And to be clear I'm not arguing for things to be more complex just for the heck of it, I agree that simplicity is great! But just that there needs to be a balance and we can't get caught over-indexing on one or the other. I think the combination of test environments being a free for all and tests being simply a set of guidelines with some static analysis both will combine to be brittle. The example Mateusz just described regarding around needing a watcher task to ensure tests end with the right result is a great example of how the route of kludging example dags themselves to be the test and the test runner can be brittle and complicated. And again, I love the idea of the example dags being the code under test, I just think having them also conduct the test execution of themselves is going to be troublesome.


But as always, if I'm the only one worried about this, I'm happy to disagree and commit and see how it goes :)


Cheers,
Niko

________________________________
From: Jarek Potiuk <ja...@potiuk.com>
Sent: Sunday, February 6, 2022 8:52 AM
To: dev@airflow.apache.org
Subject: RE: [EXTERNAL] [DISCUSSION] AIP-47 New design of Airflow System Tests


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


I think Mateusz explained the points very well. I have just a few comments to some of the points.

> 3. In general the AIP reads as if it's solved this problem, but it's more like it has absolved itself from solving this problem, which is much different. I think this approach could possibly make things even worse as now there is no contract or interface for how to plumb configuration and credentials to the system test dags. The current set of methods and files to plumb credentials through aren't great (and as of now are quite Google specific) but I think this interface can be simplified and improved rather than just exported wholesale for each provider to re-invent a new approach.

We've discussed it extensively with Mateusz (I was also of the opinion that we could do some automation here). For example wec could write a "terraform" script that creates the whole environment - set up all the service accounts etc. But Mateusz convinced me it will be very hard to "mandate" a common way of doing it for multiple "services" or "groups" of services. My proposal is that we should be clear in the AIP/framework that we don't solve it in a "common way". But instead we keep a "service-specific" way of preparing the environment. We might automate it - in a service-specific way, but having it as part of the system tests is I think out-of-scope. In a way we currently have it already with our "regular" tests. To build the AMI to run our self-hosted runners, we have a separate repo: https://github.com/apache/airflow-ci-infra/ - where we have "packer" scripts which build our image. We even tried Terraform there, but well - packer is "good enough". And we can have separate "airflow-system-tests-aws" repo and "airlfow-system-tests-gcp" repo, where we will - separately document and possibly automate how to build such a "runner"

> 4.  A system that relies on good intentions like "be sure to remember to do X otherwise bad thing Y will happen" certainly guaranties that bad thing Y will happen, frequently. Humans are fallible and not great at sticking to a contract or interface that isn't codified. And this AIP is littered with statements like this. We need test infrastructure that's easier to use and will also enforce these best practices/requirements which are needed for the tests to run.

Here - I wholeheartedly agree with Mateusz. This is GREAT simplification to have one example file doing everything. The previous approach we had was extremely complex - you had scripts, pytest tests, example dags and they were providing (meta)data to each other and it was not only hard to reason about them but also hard to verify that they are "ok". The idea of just making it one file is great. And the amount of "be sure" is not only small but it actually can be very easily enforced by pre-commits. We could make sure that our "example_dags" in a certain provider contain the (few) lines that need to be copied among them - having a common "header" and "footer" on example dag is super-simple pre-commit to write. We also discussed some other approaches and I think it is really powerful that the scripts can be run as "pytest" tests and as "standalone" scripts with SequentialDebugger. The level of control it gives for manual runs and the level of automation it provides by tapping into some of the great pytest features (parallel runs, status, flakiness, timeouts, and plethora of other plugins) - all of that makes it great to run multiple tests in the CI environment. This way it is very easy to run the system test locally even without employing pytest - when you need to run it standalone and run Pytest in CI or when you want to run multiple tests.

Re: [DISCUSSION] AIP-47 New design of Airflow System Tests

Posted by Mateusz Nojek <ma...@gmail.com>.

Thanks, Jarek, for your email. I think this completes the reasoning behind
this AIP.
However, I recently discovered the flaw in the design that was omitted.
The problem is related to the state of the test (DAG) and the trigger rules.

When the teardown task is used (e.g. to clean the resources after the test)
it has a trigger rule set to "all_done", which means it will execute every
time, no matter the stats of the other tasks. This task influences the test
state (DAG Run) because if any of the tasks fail, the teardown will be
executed as the last task and its state will be propagated to the DAG Run
state (which almost always will be "success") unexpectedly "shadowing" the
failing tasks.

The cause of this situation is the fact that the DAG Run status is
determined by the "leaf nodes" (tasks that do not have any children) and
because teardown task is a leaf node, its status can be propagated to the
whole DAG. I was thinking a lot about how we can overcome this and
discussed it extensively with Jarek Potiuk and the only possible solution
that solves this problem and doesn't require any changes to the Airflow is
by using another task (which I called `watcher` task) that will watch for
the status of the other tasks and if any of them fail, the watcher will
also fail and propagate its status to the DAG Run.

For that to work I needed to create a simple PythonOperator task with
trigger rule set to "one_failed" and an exception that will be raised when
this task is executed, then I attached this task to all other tasks as a
downstream task. Thanks to this, if any of the tasks in the DAG will fail,
this watcher task will be triggered and fail and because it's a leaf node,
it will also fail the whole test (DAG).

At first, it may look like over-complication, but it's actually a clever
solution.
It uses only built-in Airflow features and does not require any changes or
new implementation. It also creates this "separation of concern", when one
task (teardown) is responsible for cleanup, while the other one (watcher)
is responsible for monitoring the tasks. By using this watcher task, we
also have a control over which tasks are "critical" for the test and need
to be watched.

This whole solution would not be needed if we will not have a teardown task
with `trigger_rule="all_done"`, but we don't want to drop this feature
since not removing resources created for the execution of the test may lead
to significant loss of credits (due to cloud computations that cost money).
Also, keeping all the temporary structures may interfere and cause
collisions between the executions of system tests.

It's easier to grab the concept with an example. I just prepared the PR
that includes an example DAG that shows this case. Since I already dig very
deeply into how trigger rules work and how states are assigned to DAG Runs
and tasks, I decided to also extend the documentation. Please take a look
at PR#21382 https://github.com/apache/airflow/pull/21382 and tell me what
you think about it. Use PR comment section to talk about the changes
proposed there but if your comment is related to the AIP-47, please
continue by responding to this email.

Kind regards,
Mateusz Nojek


niedz., 6 lut 2022 o 17:52 Jarek Potiuk <ja...@potiuk.com> napisał(a):

> I think Mateusz explained the points very well. I have just a few comments
> to some of the points.
>
> > 3. In general the AIP reads as if it's solved this problem, but it's
> more like it has absolved itself from solving this problem, which is much
> different. I think this approach could possibly make things even worse as
> now there is no contract or interface for how to plumb configuration and
> credentials to the system test dags. The current set of methods and files
> to plumb credentials through aren't great (and as of now are quite Google
> specific) but I think this interface can be simplified and improved rather
> than just exported wholesale for each provider to re-invent a new approach.
>
> We've discussed it extensively with Mateusz (I was also of the opinion
> that we could do some automation here). For example wec could write a
> "terraform" script that creates the whole environment - set up all the
> service accounts etc. But Mateusz convinced me it will be very hard to
> "mandate" a common way of doing it for multiple "services" or "groups" of
> services. My proposal is that we should be clear in the AIP/framework that
> we don't solve it in a "common way". But instead we keep a
> "service-specific" way of preparing the environment. We might automate it -
> in a service-specific way, but having it as part of the system tests is I
> think out-of-scope. In a way we currently have it already with our
> "regular" tests. To build the AMI to run our self-hosted runners, we have a
> separate repo: https://github.com/apache/airflow-ci-infra/ - where we
> have "packer" scripts which build our image. We even tried Terraform there,
> but well - packer is "good enough". And we can have separate
> "airflow-system-tests-aws" repo and "airlfow-system-tests-gcp" repo, where
> we will - separately document and possibly automate how to build such a
> "runner"
>
> > 4.  A system that relies on good intentions like "be sure to remember to
> do X otherwise bad thing Y will happen" certainly guaranties that bad thing
> Y will happen, frequently. Humans are fallible and not great at sticking to
> a contract or interface that isn't codified. And this AIP is littered with
> statements like this. We need test infrastructure that's easier to use and
> will also enforce these best practices/requirements which are needed for
> the tests to run.
>
> Here - I wholeheartedly agree with Mateusz. This is GREAT simplification
> to have one example file doing everything. The previous approach we had was
> extremely complex - you had scripts, pytest tests, example dags and they
> were providing (meta)data to each other and it was not only hard to reason
> about them but also hard to verify that they are "ok". The idea of just
> making it one file is great. And the amount of "be sure" is not only small
> but it actually can be very easily enforced by pre-commits. We could make
> sure that our "example_dags" in a certain provider contain the (few) lines
> that need to be copied among them - having a common "header" and "footer"
> on example dag is super-simple pre-commit to write. We also discussed some
> other approaches and I think it is really powerful that the scripts can be
> run as "pytest" tests and as "standalone" scripts with SequentialDebugger.
> The level of control it gives for manual runs and the level of automation
> it provides by tapping into some of the great pytest features
> (parallel runs, status, flakiness, timeouts, and plethora of other plugins)
> - all of that makes it great to run multiple tests in the CI environment.
> This way it is very easy to run the system test locally even without
> employing pytest - when you need to run it standalone and run Pytest in CI
> or when you want to run multiple tests.
>
>

Re: [DISCUSSION] AIP-47 New design of Airflow System Tests

Posted by Jarek Potiuk <ja...@potiuk.com>.

I think Mateusz explained the points very well. I have just a few comments
to some of the points.

> 3. In general the AIP reads as if it's solved this problem, but it's more
like it has absolved itself from solving this problem, which is much
different. I think this approach could possibly make things even worse as
now there is no contract or interface for how to plumb configuration and
credentials to the system test dags. The current set of methods and files
to plumb credentials through aren't great (and as of now are quite Google
specific) but I think this interface can be simplified and improved rather
than just exported wholesale for each provider to re-invent a new approach.

We've discussed it extensively with Mateusz (I was also of the opinion that
we could do some automation here). For example wec could write a
"terraform" script that creates the whole environment - set up all the
service accounts etc. But Mateusz convinced me it will be very hard to
"mandate" a common way of doing it for multiple "services" or "groups" of
services. My proposal is that we should be clear in the AIP/framework that
we don't solve it in a "common way". But instead we keep a
"service-specific" way of preparing the environment. We might automate it -
in a service-specific way, but having it as part of the system tests is I
think out-of-scope. In a way we currently have it already with our
"regular" tests. To build the AMI to run our self-hosted runners, we have a
separate repo: https://github.com/apache/airflow-ci-infra/ - where we have
"packer" scripts which build our image. We even tried Terraform there, but
well - packer is "good enough". And we can have separate
"airflow-system-tests-aws" repo and "airlfow-system-tests-gcp" repo, where
we will - separately document and possibly automate how to build such a
"runner"

> 4.  A system that relies on good intentions like "be sure to remember to
do X otherwise bad thing Y will happen" certainly guaranties that bad thing
Y will happen, frequently. Humans are fallible and not great at sticking to
a contract or interface that isn't codified. And this AIP is littered with
statements like this. We need test infrastructure that's easier to use and
will also enforce these best practices/requirements which are needed for
the tests to run.

Here - I wholeheartedly agree with Mateusz. This is GREAT simplification to
have one example file doing everything. The previous approach we had was
extremely complex - you had scripts, pytest tests, example dags and they
were providing (meta)data to each other and it was not only hard to reason
about them but also hard to verify that they are "ok". The idea of just
making it one file is great. And the amount of "be sure" is not only small
but it actually can be very easily enforced by pre-commits. We could make
sure that our "example_dags" in a certain provider contain the (few) lines
that need to be copied among them - having a common "header" and "footer"
on example dag is super-simple pre-commit to write. We also discussed some
other approaches and I think it is really powerful that the scripts can be
run as "pytest" tests and as "standalone" scripts with SequentialDebugger.
The level of control it gives for manual runs and the level of automation
it provides by tapping into some of the great pytest features
(parallel runs, status, flakiness, timeouts, and plethora of other plugins)
- all of that makes it great to run multiple tests in the CI environment.
This way it is very easy to run the system test locally even without
employing pytest - when you need to run it standalone and run Pytest in CI
or when you want to run multiple tests.