You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/01 01:45:06 UTC

[GitHub] [airflow] ferruzzi opened a new pull request, #24058: Convert Athena Sample DAG to System Test

ferruzzi opened a new pull request, #24058:
URL: https://github.com/apache/airflow/pull/24058

   This is the first PR in our new project to convert all AWS Sample DAGs into AIP-47-compliant System Tests.  It includes the required changes to the Athena Sample DAG and docs, a few helper methods, and unit testing for said helpers.
   
   Testing:
    - Passes `breeze static-checks --all-files`
    - Passes `breeze build-docs --package-filter apache-airflow-providers-amazon`
    - Passes `./scripts/ci/testing/ci_run_airflow_testing.sh`
    - Launch Breeze and passes `pytest --system amazon tests/system/providers/amazon/aws/example_athena.py`
   
   Related: https://github.com/apache/airflow/issues/22438
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #24058: Convert Athena Sample DAG to System Test

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #24058:
URL: https://github.com/apache/airflow/pull/24058#issuecomment-1143896551

   > I think it would be an easy lift too, the biggest piece would be plugging in the cloud paramater store of your choice (we used SSM but the Google folks likely want to use something else)
   
   An idea that I had initially was to have `fetch_variable` accept a list/set of callables which would be called in order they are provided, that way we could do something like: 
   ```
   var_priority = {
       os.getenv,
       _fetch_from_ssm,
   }
   
   value = utils.fetch_variable(key, var_priority, default_value)
   ```
   
   Then anyone could drop in whatever parameter storage they choose to use, in whatever priority order they desire.  We decided that env vars would take precedence on our test suite, but there's no reason for that to be a universal truth.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #24058: Convert Athena Sample DAG to System Test

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #24058:
URL: https://github.com/apache/airflow/pull/24058


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bhirsz commented on pull request #24058: Convert Athena Sample DAG to System Test

Posted by GitBox <gi...@apache.org>.
bhirsz commented on PR #24058:
URL: https://github.com/apache/airflow/pull/24058#issuecomment-1143202906

   Looks good - especially the utility scripts! We are checking for required env vars (https://github.com/apache/airflow/blob/1dccaad46b901189c1928cef8419f1ea1160d550/tests/system/conftest.py#L44) and it may be good idea to offer some kind of interface that would allow every provider to provide some function to retrieve/set env variable value before running the tests. That's outside of this PR - I'm just wondering what we can steal from some of the ideas from different PR to get some nice common solutions :) 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #24058: Convert Athena Sample DAG to System Test

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #24058:
URL: https://github.com/apache/airflow/pull/24058#issuecomment-1142682524

   @mnojek @potiuk @bhirsz - Other system test migrations will be relying on these helper methods so, if at all possible, we're hoping to get this approved and merged ASAP with the goal of getting most/all of the System Test migration done and implemented in the next week or two.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bhirsz commented on pull request #24058: Convert Athena Sample DAG to System Test

Posted by GitBox <gi...@apache.org>.
bhirsz commented on PR #24058:
URL: https://github.com/apache/airflow/pull/24058#issuecomment-1144473114

   > ink it would be an easy lift too, t
   
   
   
   > I'd love to abstract this into a more general interface, but we'd have to get buy-in from the others who were involved in that AIP first.
   
   @o-nikolas I was also one of the people who were involved in the AIP. The communication was done through Mateusz, who then left the company - so he can chim in in the discussion since it's an open source community but the chances are we are on our own :). The general idea was to keep environment setup and running system tests separate - mostly because each provider have different needs (just for an example for running Google tests you need Google infrastructure, for AWS.. etc). At some point part of the old design was "polluted" by special needs of some providers - rather than having a uniform way of doing things. 
   I'm all for moving anything that's common into common utils - so we don't reinvent the wheel ;) And if there are provider specific parts design the interface in a "plugable" way.
   I like your utils because they cover some of the issues I've also noticed: The common env_id helps to reduce required env id (for example in Google they used dozens upon dozens env vars to ensure uniqness of the name of the resource) but 1) you need to have it defined first 2) env id need to be of the specific format. The reason for 2) is because when we're using this variable in names it needs to be a valid name for the name (for example some names cannot contain - or special signs, or start with numbers). 
   
   @ferruzzi Good idea with var_priority - it will allow to define a custom way of retrieving variables too (we could start with having only os.env as default and then allow to overwrite it). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #24058: Convert Athena Sample DAG to System Test

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on PR #24058:
URL: https://github.com/apache/airflow/pull/24058#issuecomment-1143061508

   `Tests / Helm Chart Executor Upgrade (pull_request)` is failing, but I don't _think_ that's me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24058: Convert Athena Sample DAG to System Test

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24058:
URL: https://github.com/apache/airflow/pull/24058#issuecomment-1144780452

   I love the way it is implemented @ferruzzi :). I think **maybe** over time we will generalize some of that to other providers, but I think @bhirsz explained very well about some of the problems we had when we tried to generalize too early. 
   
   When we initially implemented system tests (more than 3 years ago) we went the route of "implementing common, generalizable code" and implementing system tests usign it for google -> and then it turned out that keeping the generalization working for mutliple diferent tests is kinda difficult and indeed pollutes the solution a lot.
   
   Since then I got much bigger fan of "in order to make something reusable, it should be made usable first". So I'd say the right approach here is to "bubble up" the common utils - first let's see how reusable this approach can be for other AWS "properties" - not only Athena (we will likely learn that some of that is not as reusable as we thought) and then let's try to use similar approach in other providers and see a) whether it is needed there, b) whether the generalisation brings more problems than it solves. 
   
   I think there is no need to get an AIP or anything like that for those - working code and  PRs gradually making stuff re-usable is the best approach.
   
   There is one caveat though. I think (and there are lots of articles and blogs about it) there is a big difference between the "real application" code and test code. See for example here: https://codeshelter.wordpress.com/2011/04/07/dry-and-damp-principles-when-developing-and-unit-testing/
   
   1) For real application code "DRY" is more important than "DAMP"
   2) For test code "DAMP" is more important than "DRY" 
   
   It's never 0/1 of course - some small level of duplication (less DRY) is ok even for application code and some small level of "abstraction and reusabilitty" (less DAMP) is acceptable (pytest fixtures for example are good example of less DAMP more DRY approach). But generally speaking when I look at the test code, DAMP is far more important criteria than DRY.  I think there is a very thin lline between " the test can be read and understood on its own with the common code elsewhere" vs. "I have no idea what the test is doing because far too many things happen outside of it". This is the main reason for AIP-47 at all - the problem with the generalized approach we had was that it was rather difficult to understand what the tests were doing because different parts of the tests were in different places. 
   
   There is this really nice property that test is standalone and does not require pytest to run - just setting one variable. We should keep it. Generation of the variable if not set is even cooler and if we can make it generic enough to be reused elsewhere, that would be fantastic. 
   
   However we have to remeber to not loose those properties of simplicity and readabilityh also for some simpler providers - we should see how the code in AWS can be generalized for those - and if we see a way and it does not make those tests  unnecessary complex and less readable - by all means, let's do it (but not too early).
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] o-nikolas commented on pull request #24058: Convert Athena Sample DAG to System Test

Posted by GitBox <gi...@apache.org>.
o-nikolas commented on PR #24058:
URL: https://github.com/apache/airflow/pull/24058#issuecomment-1143885895

   > it may be good idea to offer some kind of interface that would allow every provider to provide some function to retrieve/set env variable value before running the tests. That's outside of this PR - I'm just wondering what we can steal from some of the ideas from different PR to get some nice common solutions :)
   
   Thanks @bhirsz! While reviewing the System Test AIP the lack of a shared interface for environment setup was one of the things I brought up and I had some back and fourth about in the dev email discussion thread with Mateusz and Jarek ([link](https://lists.apache.org/thread/htd4013yn483qfhwv11vc26jpf2yvjph)). Ultimately the AIP stayed as it was and for AWS we designed this setup system for our provider tests. I'd love to abstract this into a more general interface, but we'd have to get buy-in from the others who were involved in that AIP first.
   
   I think it would be an easy lift too, the biggest piece would be plugging in the cloud paramater store of your choice (we used SSM but the Google folks likely want to use something else) and any specific requirements around the env_id format (what type of characters are allowed or not for the different provider components).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi closed pull request #24058: Convert Athena Sample DAG to System Test

Posted by GitBox <gi...@apache.org>.
ferruzzi closed pull request #24058: Convert Athena Sample DAG to System Test
URL: https://github.com/apache/airflow/pull/24058


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org