You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/02 11:59:15 UTC

[GitHub] [airflow] potiuk commented on pull request #24058: Convert Athena Sample DAG to System Test

potiuk commented on PR #24058:
URL: https://github.com/apache/airflow/pull/24058#issuecomment-1144780452

I love the way it is implemented @ferruzzi :). I think **maybe** over time we will generalize some of that to other providers, but I think @bhirsz explained very well about some of the problems we had when we tried to generalize too early.

When we initially implemented system tests (more than 3 years ago) we went the route of "implementing common, generalizable code" and implementing system tests usign it for google -> and then it turned out that keeping the generalization working for mutliple diferent tests is kinda difficult and indeed pollutes the solution a lot.

Since then I got much bigger fan of "in order to make something reusable, it should be made usable first". So I'd say the right approach here is to "bubble up" the common utils - first let's see how reusable this approach can be for other AWS "properties" - not only Athena (we will likely learn that some of that is not as reusable as we thought) and then let's try to use similar approach in other providers and see a) whether it is needed there, b) whether the generalisation brings more problems than it solves.

I think there is no need to get an AIP or anything like that for those - working code and PRs gradually making stuff re-usable is the best approach.

There is one caveat though. I think (and there are lots of articles and blogs about it) there is a big difference between the "real application" code and test code. See for example here: https://codeshelter.wordpress.com/2011/04/07/dry-and-damp-principles-when-developing-and-unit-testing/

1) For real application code "DRY" is more important than "DAMP"
2) For test code "DAMP" is more important than "DRY"

It's never 0/1 of course - some small level of duplication (less DRY) is ok even for application code and some small level of "abstraction and reusabilitty" (less DAMP) is acceptable (pytest fixtures for example are good example of less DAMP more DRY approach). But generally speaking when I look at the test code, DAMP is far more important criteria than DRY. I think there is a very thin lline between " the test can be read and understood on its own with the common code elsewhere" vs. "I have no idea what the test is doing because far too many things happen outside of it". This is the main reason for AIP-47 at all - the problem with the generalized approach we had was that it was rather difficult to understand what the tests were doing because different parts of the tests were in different places.

There is this really nice property that test is standalone and does not require pytest to run - just setting one variable. We should keep it. Generation of the variable if not set is even cooler and if we can make it generic enough to be reused elsewhere, that would be fantastic.

However we have to remeber to not loose those properties of simplicity and readabilityh also for some simpler providers - we should see how the code in AWS can be generalized for those - and if we see a way and it does not make those tests unnecessary complex and less readable - by all means, let's do it (but not too early).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org