You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/20 15:56:10 UTC

[GitHub] [airflow] jobegrabber opened a new issue #20426: GKE Authentication not possible with user ADC and project ID set in either connection or `gcloud` config

jobegrabber opened a new issue #20426:
URL: https://github.com/apache/airflow/issues/20426


   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   `apache-airflow-providers-google==6.1.0`
   
   ### Apache Airflow version
   
   2.1.4
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   At my company we're developing our Airflow DAGs in local environments based on Docker Compose.
   To authenticate against the GCP, we don't use service accounts and their keys, but instead use our user credentials and set them up as Application Default Credentials (ADC), i.e. we run
   
   ```
   $ gcloud auth login
   $ gcloud gcloud auth application-default login
   ```
   
   We also set the default the Project ID in both `gcloud` and Airflow connections, i.e.
   
   ```
   $ gcloud config set project $PROJECT
   $ # run the following inside the Airflow Docker container
   $ airflow connections delete google_cloud_default
   $ airflow connections add google_cloud_default \
         --conn-type=google_cloud_platform \
         --conn-extra='{"extra__google_cloud_platform__project":"$PROJECT"}'
   ```
   
   
   ### What happened
   
   It seems that due to [this part](https://github.com/apache/airflow/blob/ed604b6/airflow/providers/google/common/hooks/base_google.py#L518..L541) in `base_google.py`, when the Project ID is set in either the Airflow connections or `gcloud` config, `gcloud auth` (specifically `gcloud auth activate-refresh-token`) will not be executed.
   
   This results in e.g. `gcloud container clusters get-credentials` in the `GKEStartPodOperator` to fail, since `You do not currently have an active account selected`:
   
   ```
   [2021-12-20 15:21:12,059] {credentials_provider.py:295} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
   [2021-12-20 15:21:12,073] {logging_mixin.py:109} WARNING - /usr/local/lib/python3.8/site-packages/google/auth/_default.py:70 UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
   [2021-12-20 15:21:13,863] {process_utils.py:135} INFO - Executing cmd: gcloud container clusters get-credentials REDACTED --zone europe-west1-b --project REDACTED
   [2021-12-20 15:21:13,875] {process_utils.py:139} INFO - Output:
   [2021-12-20 15:21:14,522] {process_utils.py:143} INFO - ERROR: (gcloud.container.clusters.get-credentials) You do not currently have an active account selected.
   [2021-12-20 15:21:14,522] {process_utils.py:143} INFO - Please run:
   [2021-12-20 15:21:14,523] {process_utils.py:143} INFO - 
   [2021-12-20 15:21:14,523] {process_utils.py:143} INFO -   $ gcloud auth login
   [2021-12-20 15:21:14,523] {process_utils.py:143} INFO - 
   [2021-12-20 15:21:14,523] {process_utils.py:143} INFO - to obtain new credentials.
   [2021-12-20 15:21:14,523] {process_utils.py:143} INFO - 
   [2021-12-20 15:21:14,523] {process_utils.py:143} INFO - If you have already logged in with a different account:
   [2021-12-20 15:21:14,523] {process_utils.py:143} INFO - 
   [2021-12-20 15:21:14,523] {process_utils.py:143} INFO -     $ gcloud config set account ACCOUNT
   [2021-12-20 15:21:14,523] {process_utils.py:143} INFO - 
   [2021-12-20 15:21:14,523] {process_utils.py:143} INFO - to select an already authenticated account to use.
   [2021-12-20 15:21:14,618] {taskinstance.py:1463} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1165, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1283, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File "/usr/local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1313, in _execute_task
       result = task_copy.execute(context=context)
     File "/usr/local/lib/python3.8/site-packages/airflow/providers/google/cloud/operators/kubernetes_engine.py", line 355, in execute
       execute_in_subprocess(cmd)
     File "/usr/local/lib/python3.8/site-packages/airflow/utils/process_utils.py", line 147, in execute_in_subprocess
       raise subprocess.CalledProcessError(exit_code, cmd)
   subprocess.CalledProcessError: Command '['gcloud', 'container', 'clusters', 'get-credentials', 'REDACTED', '--zone', 'europe-west1-b', '--project', 'REDACTED']' returned non-zero exit status 1.
   ```
   
   If we set the environment variable `GOOGLE_APPLICATION_CREDENTIALS`, `gcloud auth activate-service-account` is run which only works with proper service account credentials, not user credentials.
   
   
   ### What you expected to happen
   
   From my POV, it should work to
   1. have the Project ID set in the `gcloud` config and/or Airflow variables and still be  to use user credentials with GCP Operators, and
   2. set `GOOGLE_APPLICATION_CREDENTIALS` to a file containing user credentials and be able to use these credentials with GCP Operators.
   
   ### How to reproduce
   
   See Deployment Details. In essence:
   - Run Airflow within Docker Compose (but it's only Docker Compose that is affected, as far as I can see).
   - Use user credentials with `gcloud`; `gcloud auth login`, `gcloud auth application-default login`
   - Configure project ID in `gcloud` config (mounted in the Docker container) and/or Airflow connection
   - Run `GKEStartOperator`
   
   ### Anything else
   
   Currently, the only workaround (apart from using service accounts) seems to be to not set a default project in either the `gcloud` config or `google_cloud_platform` connections.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jobegrabber edited a comment on issue #20426: GKE Authentication not possible with user ADC and project ID set in either connection or `gcloud` config

Posted by GitBox <gi...@apache.org>.
jobegrabber edited a comment on issue #20426:
URL: https://github.com/apache/airflow/issues/20426#issuecomment-998854218


   [Some updates..](https://github.com/apache/airflow/pull/20428#issuecomment-998714275)
   
   Seems like certain tests are broken:
   
   > For some reason, `has_calls` is used instead of `assert_has_calls`. I couldn't find any documentation of `has_calls`, but it seems to do the same thing as `assert_has_calls`, just that it returns a boolean instead of doing an assertion on its own. This means the affected tests don't actually assert anything, making them pass although they shouldn't.
   > 
   > There are a couple of other places where `has_calls` is used:
   > 
   > ```
   > (airflow-env) ➜  airflow git:(main) ✗ egrep -ir '[^_]has_calls' .
   > ./tests/providers/google/common/hooks/test_base_google.py:        mock_check_output.has_calls(
   > ./tests/providers/google/common/hooks/test_base_google.py:        mock_check_output.has_calls(
   > ./tests/providers/google/cloud/transfers/test_sheets_to_gcs.py:        mock_sheet_hook.return_value.get_values.has_calls(calls)
   > ./tests/providers/google/cloud/transfers/test_sheets_to_gcs.py:        mock_upload_data.has_calls(calls)
   > ./tests/providers/google/cloud/hooks/test_bigquery.py:        mock_poll_job_complete.has_calls(mock.call(running_job_id), mock.call(running_job_id))
   > ./tests/providers/google/cloud/hooks/test_bigquery.py:        mock_schema.has_calls([mock.call(x, "") for x in ["field_1", "field_2"]])
   > ./tests/providers/google/cloud/hooks/test_bigquery.py:        assert mock_insert.has_calls(
   > ./tests/providers/google/cloud/hooks/test_pubsub.py:        publish_method.has_calls(calls)
   > ./tests/providers/google/cloud/hooks/test_cloud_memorystore.py:        mock_get_conn.return_value.get_instance.has_calls(
   > ./tests/providers/google/cloud/hooks/test_cloud_memorystore.py:        mock_get_conn.return_value.get_instance.has_calls(
   > ./tests/providers/google/cloud/hooks/test_cloud_memorystore.py:        mock_get_conn.return_value.get_instance.has_calls(
   > ./tests/providers/google/cloud/hooks/test_dataproc.py:        mock_get_job.has_calls(calls)
   > ./tests/providers/google/cloud/hooks/test_dataproc.py:            mock_get_job.has_calls(calls)
   > ./tests/providers/google/suite/operators/test_sheets.py:        mock_xcom.has_calls(calls)
   > ./tests/providers/http/operators/test_http.py:            mock_info.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #20426: GKE Authentication not possible with user ADC and project ID set in either connection or `gcloud` config

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #20426:
URL: https://github.com/apache/airflow/issues/20426#issuecomment-998117134


   Good spot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jobegrabber edited a comment on issue #20426: GKE Authentication not possible with user ADC and project ID set in either connection or `gcloud` config

Posted by GitBox <gi...@apache.org>.
jobegrabber edited a comment on issue #20426:
URL: https://github.com/apache/airflow/issues/20426#issuecomment-998054053






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jobegrabber commented on issue #20426: GKE Authentication not possible with user ADC and project ID set in either connection or `gcloud` config

Posted by GitBox <gi...@apache.org>.
jobegrabber commented on issue #20426:
URL: https://github.com/apache/airflow/issues/20426#issuecomment-998054053


   It seems like https://github.com/apache/airflow/commit/2fadf3c3cf6e8a5d26953ebce6401ab5059ee05f#diff-3d70cf5dda479898625f0a9cbf515b926789b1476301ae398d83082f08d07c13 introduce the change by not only changing the "order of activating service account and setting project", but also changing the evaluation logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #20426: GKE Authentication not possible with user ADC and project ID set in either connection or `gcloud` config

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #20426:
URL: https://github.com/apache/airflow/issues/20426


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jobegrabber edited a comment on issue #20426: GKE Authentication not possible with user ADC and project ID set in either connection or `gcloud` config

Posted by GitBox <gi...@apache.org>.
jobegrabber edited a comment on issue #20426:
URL: https://github.com/apache/airflow/issues/20426#issuecomment-998854218


   [Some updates..](https://github.com/apache/airflow/pull/20428#issuecomment-998714275)
   
   Seems like certain tests are broken:
   
   > For some reason, `has_calls` is used instead of `assert_has_calls`. This means the affected tests don't actually assert anything, making them pass although they shouldn't.
   > 
   > There are a couple of other places where `has_calls` is used:
   > 
   > ```
   > (airflow-env) ➜  airflow git:(main) ✗ egrep -ir '[^_]has_calls' .
   > ./tests/providers/google/common/hooks/test_base_google.py:        mock_check_output.has_calls(
   > ./tests/providers/google/common/hooks/test_base_google.py:        mock_check_output.has_calls(
   > ./tests/providers/google/cloud/transfers/test_sheets_to_gcs.py:        mock_sheet_hook.return_value.get_values.has_calls(calls)
   > ./tests/providers/google/cloud/transfers/test_sheets_to_gcs.py:        mock_upload_data.has_calls(calls)
   > ./tests/providers/google/cloud/hooks/test_bigquery.py:        mock_poll_job_complete.has_calls(mock.call(running_job_id), mock.call(running_job_id))
   > ./tests/providers/google/cloud/hooks/test_bigquery.py:        mock_schema.has_calls([mock.call(x, "") for x in ["field_1", "field_2"]])
   > ./tests/providers/google/cloud/hooks/test_bigquery.py:        assert mock_insert.has_calls(
   > ./tests/providers/google/cloud/hooks/test_pubsub.py:        publish_method.has_calls(calls)
   > ./tests/providers/google/cloud/hooks/test_cloud_memorystore.py:        mock_get_conn.return_value.get_instance.has_calls(
   > ./tests/providers/google/cloud/hooks/test_cloud_memorystore.py:        mock_get_conn.return_value.get_instance.has_calls(
   > ./tests/providers/google/cloud/hooks/test_cloud_memorystore.py:        mock_get_conn.return_value.get_instance.has_calls(
   > ./tests/providers/google/cloud/hooks/test_dataproc.py:        mock_get_job.has_calls(calls)
   > ./tests/providers/google/cloud/hooks/test_dataproc.py:            mock_get_job.has_calls(calls)
   > ./tests/providers/google/suite/operators/test_sheets.py:        mock_xcom.has_calls(calls)
   > ./tests/providers/http/operators/test_http.py:            mock_info.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #20426: GKE Authentication not possible with user ADC and project ID set in either connection or `gcloud` config

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #20426:
URL: https://github.com/apache/airflow/issues/20426#issuecomment-998051128


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jobegrabber commented on issue #20426: GKE Authentication not possible with user ADC and project ID set in either connection or `gcloud` config

Posted by GitBox <gi...@apache.org>.
jobegrabber commented on issue #20426:
URL: https://github.com/apache/airflow/issues/20426#issuecomment-998854218


   [Some updates..](https://github.com/apache/airflow/pull/20428#issuecomment-998714275)
   
   > For some reason, `has_calls` is used instead of `assert_has_calls`. I couldn't find any documentation of `has_calls`, but it seems to do the same thing as `assert_has_calls`, just that it returns a boolean instead of doing an assertion on its own. This means the affected tests don't actually assert anything, making them pass although they shouldn't.
   > 
   > There are a couple of other places where `has_calls` is used:
   > 
   > ```
   > (airflow-env) ➜  airflow git:(main) ✗ egrep -ir '[^_]has_calls' .
   > ./tests/providers/google/common/hooks/test_base_google.py:        mock_check_output.has_calls(
   > ./tests/providers/google/common/hooks/test_base_google.py:        mock_check_output.has_calls(
   > ./tests/providers/google/cloud/transfers/test_sheets_to_gcs.py:        mock_sheet_hook.return_value.get_values.has_calls(calls)
   > ./tests/providers/google/cloud/transfers/test_sheets_to_gcs.py:        mock_upload_data.has_calls(calls)
   > ./tests/providers/google/cloud/hooks/test_bigquery.py:        mock_poll_job_complete.has_calls(mock.call(running_job_id), mock.call(running_job_id))
   > ./tests/providers/google/cloud/hooks/test_bigquery.py:        mock_schema.has_calls([mock.call(x, "") for x in ["field_1", "field_2"]])
   > ./tests/providers/google/cloud/hooks/test_bigquery.py:        assert mock_insert.has_calls(
   > ./tests/providers/google/cloud/hooks/test_pubsub.py:        publish_method.has_calls(calls)
   > ./tests/providers/google/cloud/hooks/test_cloud_memorystore.py:        mock_get_conn.return_value.get_instance.has_calls(
   > ./tests/providers/google/cloud/hooks/test_cloud_memorystore.py:        mock_get_conn.return_value.get_instance.has_calls(
   > ./tests/providers/google/cloud/hooks/test_cloud_memorystore.py:        mock_get_conn.return_value.get_instance.has_calls(
   > ./tests/providers/google/cloud/hooks/test_dataproc.py:        mock_get_job.has_calls(calls)
   > ./tests/providers/google/cloud/hooks/test_dataproc.py:            mock_get_job.has_calls(calls)
   > ./tests/providers/google/suite/operators/test_sheets.py:        mock_xcom.has_calls(calls)
   > ./tests/providers/http/operators/test_http.py:            mock_info.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ./tests/providers/airbyte/hooks/test_airbyte.py:        assert mock_get_job.has_calls(calls)
   > ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org