You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/04 23:33:39 UTC

[GitHub] [airflow] yehoshuadimarsky opened a new issue #20660: Exception unfairly raised in SQL Operator when using Application Default Credentials with Workload Identity for Cloud SQL Auth proxy

yehoshuadimarsky opened a new issue #20660:
URL: https://github.com/apache/airflow/issues/20660


   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-amazon==2.3.0
   apache-airflow-providers-celery==2.1.0
   apache-airflow-providers-cncf-kubernetes==2.0.0
   apache-airflow-providers-docker==2.2.0
   apache-airflow-providers-elasticsearch==2.0.3
   apache-airflow-providers-ftp==2.0.1
   apache-airflow-providers-google==6.0.0
   apache-airflow-providers-grpc==2.0.1
   apache-airflow-providers-hashicorp==2.1.1
   apache-airflow-providers-http==2.0.1
   apache-airflow-providers-imap==2.0.1
   apache-airflow-providers-microsoft-azure==3.2.0
   apache-airflow-providers-mysql==2.1.1
   apache-airflow-providers-odbc==2.0.1
   apache-airflow-providers-postgres==2.3.0
   apache-airflow-providers-redis==2.0.1
   apache-airflow-providers-sendgrid==2.0.1
   apache-airflow-providers-sftp==2.1.1
   apache-airflow-providers-slack==4.1.0
   apache-airflow-providers-sqlite==2.0.1
   apache-airflow-providers-ssh==2.2.0
   ```
   
   ### Apache Airflow version
   
   2.2.1
   
   ### Operating System
   
   Debian Buster
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Running the official Airflow Helm chart on GKE. Have Workload Identity set up and working, linked the Google and Kubernetes service accounts.
   
   Created the an Airflow connection for GCP ADC, per the [instructions](https://airflow.apache.org/docs/apache-airflow-providers-google/6.2.0/connections/gcp.html#note-on-application-default-credentials)
   ```bash
   airflow connections add \
       --conn-type google_cloud_platform \
       'gcp-airflow-svc-acct-dev'
   ```
   
   The SQL connection - note, no password!
   ```bash
   airflow connections add \
       --conn-type gcpcloudsql \
       --conn-host $SQL_INSTANCE_PUBLIC_IP \
       --conn-login {GSA_NAME}@{PROJECT_ID}.iam  \  # linked to KSA via WI
       --conn-extra '{"instance": "INSTANCE_NAME", "location": "us-east1", "database_type": "postgres", "project_id": "PROJECT_ID", "use_proxy": true, "sql_proxy_use_tcp": true}' \
       'gcp-sql-ods-dev'
   ```
   
   ### What happened
   
   Getting this error when trying to run a [CloudSQLExecuteQueryOperator](https://airflow.apache.org/docs/apache-airflow-providers-google/6.0.0/_api/airflow/providers/google/cloud/operators/cloud_sql/index.html#airflow.providers.google.cloud.operators.cloud_sql.CloudSQLExecuteQueryOperator)
   
   ```
   [2022-01-04, 18:03:49 EST] {taskinstance.py:1703} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
       self._execute_task_with_callbacks(context)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
       result = self._execute_task(context, self.task)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
       result = execute_callable(context=context)
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/google/cloud/operators/cloud_sql.py", line 1076, in execute
       connection = hook.create_connection()
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/google/cloud/hooks/cloud_sql.py", line 932, in create_connection
       uri = self._generate_connection_uri()
     File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/google/cloud/hooks/cloud_sql.py", line 893, in _generate_connection_uri
       raise AirflowException("The password parameter needs to be set in connection")
   airflow.exceptions.AirflowException: The password parameter needs to be set in connection
   ```
   
   ### What you expected to happen
   
   I expect to be able to use the Cloud SQL Auth proxy from GKE using the IAM database authentication of the Workload Identity-provided GSA and not needing any passwords at all. However, the Airflow Hook is hardcoded to raise an error if there is no password in the SQL connection. Why? Outside of Airflow it can work without a password via Cloud SQL Auth proxy and WI etc, can we allow that in Airflow too?
   
   ### How to reproduce
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yehoshuadimarsky commented on issue #20660: Google SQL shouldn't require password when using Application Default Credentials and Cloud SQL Auth proxy

Posted by GitBox <gi...@apache.org>.
yehoshuadimarsky commented on issue #20660:
URL: https://github.com/apache/airflow/issues/20660#issuecomment-1006152110


   So I think I know how to fix it, but I'm at a loss how to test it. I've read all the Breeze stuff and the relevant test code at 
   https://github.com/apache/airflow/blob/68fa358c85d0ebe1aa22ace30b33867ad4e622a8/tests/providers/google/cloud/hooks/test_cloud_sql.py#L989-L1038
   and
   https://github.com/apache/airflow/blob/68fa358c85d0ebe1aa22ace30b33867ad4e622a8/tests/providers/google/cloud/hooks/test_cloud_sql.py#L1122-L1135
   But I can't figure out how to test this kind of login within old-style `unittest` patching - any suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yehoshuadimarsky edited a comment on issue #20660: Google SQL shouldn't require password when using Application Default Credentials and Cloud SQL Auth proxy

Posted by GitBox <gi...@apache.org>.
yehoshuadimarsky edited a comment on issue #20660:
URL: https://github.com/apache/airflow/issues/20660#issuecomment-1006152110


   So I think I know how to fix it, but I'm at a loss how to test it. I've read all the Breeze stuff and the relevant test code.
   
   The fix:
   ```diff
   diff --git a/airflow/providers/google/cloud/hooks/cloud_sql.py b/airflow/providers/google/cloud/hooks/cloud_sql.py
   index 1b6766390..5275a46c3 100644
   --- a/airflow/providers/google/cloud/hooks/cloud_sql.py
   +++ b/airflow/providers/google/cloud/hooks/cloud_sql.py
   @@ -536,10 +536,10 @@ class CloudSqlProxyRunner(LoggingMixin):
                self.log.info(
                    "The credentials are not supplied by neither key_path nor "
                    "keyfile_dict of the gcp connection %s. Falling back to "
   -                "default activated account",
   +                "default activated account and using automatic IAM login",
                    self.gcp_conn_id,
                )
   -            credential_params = []
   +            credential_params = ["-enable_iam_login"]
    
            if not self.instance_specification:
                project_id = connection.extra_dejson.get('extra__google_cloud_platform__project')
   @@ -890,8 +890,6 @@ class CloudSQLDatabaseHook(BaseHook):
                raise AirflowException("The login parameter needs to be set in connection")
            if not self.public_ip:
                raise AirflowException("The location parameter needs to be set in connection")
   -        if not self.password:
   -            raise AirflowException("The password parameter needs to be set in connection")
            if not self.database:
                raise AirflowException("The database parameter needs to be set in connection")
   ```
   Ref: https://cloud.google.com/sql/docs/postgres/iam-logins#cloud-sql-auth-proxy
   
   Relevant parts of existing test code: 
   https://github.com/apache/airflow/blob/68fa358c85d0ebe1aa22ace30b33867ad4e622a8/tests/providers/google/cloud/hooks/test_cloud_sql.py#L989-L1038
   and
   https://github.com/apache/airflow/blob/68fa358c85d0ebe1aa22ace30b33867ad4e622a8/tests/providers/google/cloud/hooks/test_cloud_sql.py#L1122-L1135
   But I can't figure out how to test this kind of login within old-style `unittest` patching - any suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20660: Google SQL shouldn't require password when using Application Default Credentials and Cloud SQL Auth proxy

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20660:
URL: https://github.com/apache/airflow/issues/20660#issuecomment-1005997313


   Feel Free to add pr @yehoshuadimarsky . Cloud SQL operator was added when there was no Workload Identity (at all I tihnk not only for Cloud SQL).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yehoshuadimarsky commented on issue #20660: Google SQL shouldn't require password when using Application Default Credentials and Cloud SQL Auth proxy

Posted by GitBox <gi...@apache.org>.
yehoshuadimarsky commented on issue #20660:
URL: https://github.com/apache/airflow/issues/20660#issuecomment-1005997602


   Happy to give it a shot, I'll try!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org