You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/24 11:28:00 UTC

[GitHub] [airflow] mnojek opened a new pull request #18494: Fix part of Google system tests

mnojek opened a new pull request #18494:
URL: https://github.com/apache/airflow/pull/18494


   This PR introduces fixes to some of the failing system tests related to Google operators.
   In case of any questions of uncertainty, I can describe why each change is needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#discussion_r716791580



##########
File path: airflow/providers/google/cloud/example_dags/example_cloud_sql.py
##########
@@ -48,8 +49,8 @@
 from airflow.utils.dates import days_ago
 
 GCP_PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'example-project')
-INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql')
-INSTANCE_NAME2 = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME2', 'test-mysql2')
+INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql') + str(random.getrandbits(16))

Review comment:
       Yes. This is HIGHLY annoying with cloudsql that name cannot ber reused for a day or so even if db is deleted. 
   
   However I agree random POSTFIX generation in the DAG is a bad idea.
   
   What we used to have in the past is that we had `variables.env` file in `files/airflow-breeze-config`  where we sourced files with variables and we had a very simple script that generated the random postfix if it was missing.
   
   Then you could do step-by-step testing with keeping the randomly generated postfix even across breeze restarts.
   When you needed to change the database name you'd simply remove the file and it would be re-generated automatically at breeze entry.
   
   https://github.com/apache/airflow/blob/main/BREEZE.rst#customize-your-environment
   
   Something like that might work (writing it from memory so I am not sure if it is correct) in variables.env:
   
   ```
   if [[ ! -f /files/random.env ]]; then 
      echo "export RANDOM_POSTFIX=${RANDOM}" > /files/random.env
   fi
   source /files/random.env
   
   export GCSQL_MYSQL_INSTANCE_NAME="test-mysql-${RANDOM_POSTFIX}"
   ```
   
   This has several nice properties:
    * everyone has its own random value
    * you keep it stable between runs or even between debug sessions - for example you could ran tasks from the example DAG separately one-by-one
    * you can very easily regenerate the number by simply deleting the /files/random.env
   
   In the past we even shared the whole `airflow-breeze-config` directory was actually checked out separate repository where we kept all variables used by the team. This way you could share different variables between same users who have access to the same repo - at the same time each of the users will have different postifx as the random.env would not be part of the repo.
   
   Just an inspiration if you would like to optimize your development workflow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj merged pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mik-laj merged pull request #18494:
URL: https://github.com/apache/airflow/pull/18494


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj merged pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mik-laj merged pull request #18494:
URL: https://github.com/apache/airflow/pull/18494


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mnojek commented on a change in pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mnojek commented on a change in pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#discussion_r716376092



##########
File path: airflow/providers/google/cloud/example_dags/example_cloud_sql.py
##########
@@ -48,8 +49,8 @@
 from airflow.utils.dates import days_ago
 
 GCP_PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'example-project')
-INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql')
-INSTANCE_NAME2 = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME2', 'test-mysql2')
+INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql') + str(random.getrandbits(16))

Review comment:
       I added this because the instance name is unique only for the specific Airflow container so each time I run the test within 1 Airflow run, it's the same. The problem was that if this system test failed before deleting the instance, it couldn't be run again because it wanted to create instance with the same instance while it was still present. That's why I added it. So it works only when it passes within the first execution.
   That is a simple workaround to make each test execution independent but overall it's not the desired solution. Can you point me where else this value is used?
   
   Ideally, I would want to have independent tests and also a mechanism that will cleanup all the values created by the test if it fails in the middle. This is not the scope of this PR, but in near future I plan to work on that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#issuecomment-927973836


   Some static checks failed :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mnojek commented on a change in pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mnojek commented on a change in pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#discussion_r716376092



##########
File path: airflow/providers/google/cloud/example_dags/example_cloud_sql.py
##########
@@ -48,8 +49,8 @@
 from airflow.utils.dates import days_ago
 
 GCP_PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'example-project')
-INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql')
-INSTANCE_NAME2 = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME2', 'test-mysql2')
+INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql') + str(random.getrandbits(16))

Review comment:
       I added this because the instance name is unique only for the specific Airflow container so each time I run the test within 1 Airflow run, it's the same. The problem was that if this system test failed before deleting the instance, it couldn't be run again because it wanted to create instance with the same instance while ti was still present. That's why I added it. So it works only when it passes within the first execution.
   That is a simple workaround to make each test execution independent but overall it's not the desired solution. Can you point me where else this value is used?
   
   Ideally, I would want to have independent tests and also an a mechanism that will cleanup all the values created by the test if it fails in the middle. This is not the scope of this PR, but in near future I plan to work on that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mnojek commented on pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mnojek commented on pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#issuecomment-930101319


   > Some static checks failed :(
   
   Now it works. I also finished working on this PR, so it can be merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#discussion_r715978473



##########
File path: airflow/providers/google/cloud/example_dags/example_cloud_sql.py
##########
@@ -48,8 +49,8 @@
 from airflow.utils.dates import days_ago
 
 GCP_PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'example-project')
-INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql')
-INSTANCE_NAME2 = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME2', 'test-mysql2')
+INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql') + str(random.getrandbits(16))

Review comment:
       The unique value should be generated by the test run script, because the DAG file is loaded multiple times by Airflow, so we have to make sure that it had the same structure each time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mnojek commented on pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mnojek commented on pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#issuecomment-930101319


   > Some static checks failed :(
   
   Now it works. I also finished working on this PR, so it can be merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mnojek commented on a change in pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mnojek commented on a change in pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#discussion_r718389021



##########
File path: airflow/providers/google/cloud/example_dags/example_cloud_sql.py
##########
@@ -48,8 +49,8 @@
 from airflow.utils.dates import days_ago
 
 GCP_PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'example-project')
-INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql')
-INSTANCE_NAME2 = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME2', 'test-mysql2')
+INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql') + str(random.getrandbits(16))

Review comment:
       Thanks for the suggestions, @potiuk @mik-laj! I will not implement them now but I will definitely use your comments when designing the new approach for running system tests. Hopefully we can make it better and more reliable 😃 
   In the meantime I will remove this change and hopefully this will not be the case in the new approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#discussion_r716870133



##########
File path: airflow/providers/google/cloud/example_dags/example_cloud_sql.py
##########
@@ -48,8 +49,8 @@
 from airflow.utils.dates import days_ago
 
 GCP_PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'example-project')
-INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql')
-INSTANCE_NAME2 = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME2', 'test-mysql2')
+INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql') + str(random.getrandbits(16))

Review comment:
       @mnojek Airflow is a distributed application, which means that one DAG file is loaded multiple times on different nodes, so we have to make sure that this instance name has the same value on all nodes. These examples are used in system tests, where this condition is not necessary because we have a common memory, but these examples are also the inspiration for novice users who can use another executor e.g. CeleryExecutor, so each DAG will be loaded on each node separately.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#discussion_r716170329



##########
File path: airflow/providers/google/cloud/example_dags/example_cloud_sql.py
##########
@@ -48,8 +49,8 @@
 from airflow.utils.dates import days_ago
 
 GCP_PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'example-project')
-INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql')
-INSTANCE_NAME2 = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME2', 'test-mysql2')
+INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql') + str(random.getrandbits(16))

Review comment:
       I agree with @mik-laj - we use this value at least two times and it will be different every time. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#discussion_r716873050



##########
File path: airflow/providers/google/cloud/example_dags/example_cloud_sql.py
##########
@@ -48,8 +49,8 @@
 from airflow.utils.dates import days_ago
 
 GCP_PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'example-project')
-INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql')
-INSTANCE_NAME2 = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME2', 'test-mysql2')
+INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql') + str(random.getrandbits(16))

Review comment:
       Here is an example project that we used to run system tests for Google Cloud on Google Cloud Build:
   
   https://github.com/politools/airflow-system-tests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#issuecomment-926625638


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#issuecomment-926768590


   Some static checks failed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mnojek commented on a change in pull request #18494: Fix part of Google system tests

Posted by GitBox <gi...@apache.org>.
mnojek commented on a change in pull request #18494:
URL: https://github.com/apache/airflow/pull/18494#discussion_r716376092



##########
File path: airflow/providers/google/cloud/example_dags/example_cloud_sql.py
##########
@@ -48,8 +49,8 @@
 from airflow.utils.dates import days_ago
 
 GCP_PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'example-project')
-INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql')
-INSTANCE_NAME2 = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME2', 'test-mysql2')
+INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql') + str(random.getrandbits(16))

Review comment:
       I added this because the instance name is unique only for the specific Airflow container so each time I run the test within 1 Airflow run, it's the same. The problem was that if this system test failed before deleting the instance, it couldn't be run again because it wanted to create instance with the same instance while it was still present. That's why I added it. So it works only when it passes within the first execution.
   That is a simple workaround to make each test execution independent but overall it's not the desired solution. Can you point me where else this value is used?
   
   Ideally, I would want to have independent tests and also a mechanism that will cleanup all the values created by the test if it fails in the middle. This is not the scope of this PR, but in near future I plan to work on that.

##########
File path: airflow/providers/google/cloud/example_dags/example_cloud_sql.py
##########
@@ -48,8 +49,8 @@
 from airflow.utils.dates import days_ago
 
 GCP_PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'example-project')
-INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql')
-INSTANCE_NAME2 = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME2', 'test-mysql2')
+INSTANCE_NAME = os.environ.get('GCSQL_MYSQL_INSTANCE_NAME', 'test-mysql') + str(random.getrandbits(16))

Review comment:
       Thanks for the suggestions, @potiuk @mik-laj! I will not implement them now but I will definitely use your comments when designing the new approach for running system tests. Hopefully we can make it better and more reliable 😃 
   In the meantime I will remove this change and hopefully this will not be the case in the new approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org