You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/08 22:19:27 UTC

[GitHub] [airflow] jceresini opened a new issue #21440: Dataflow Operator fails with Application Default credentials

jceresini opened a new issue #21440:
URL: https://github.com/apache/airflow/issues/21440


   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   $ pip freeze | grep airflow-providers
   apache-airflow-providers-apache-beam==3.1.0
   apache-airflow-providers-ftp==2.0.1
   apache-airflow-providers-google==6.3.0
   apache-airflow-providers-http==2.0.2
   apache-airflow-providers-imap==2.1.0
   apache-airflow-providers-sqlite==2.0.1
   
   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### Operating System
   
   MacOS 11.6.1 (20G224)
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   The only python packages explicitly installed were `apache-airflow-providers-google` and `apache-airflow-providers-apache-beam`.
   
   ### What happened
   
   [2022-02-08 16:44:23,610] {credentials_provider.py:330} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
   [2022-02-08 16:44:25,409] {http.py:123} WARNING - Invalid JSON content from response: b'{\n  "error": {\n    "code": 403,\n    "message": "User must be authenticated when user project is provided",\n    "status": "PERMISSION_DENIED",\n    "details": [\n      {\n        "@type": "type.googleapis.com/google.rpc.ErrorInfo",\n        "reason": "USER_PROJECT_DENIED",\n        "domain": "googleapis.com",\n        "metadata": {\n          "service": "dataflow.googleapis.com",\n          "consumer": "projects/cd-np-test1"\n        }\n      }\n    ]\n  }\n}\n'
   
   That first line is expected. The operator should be calling `google.auth.default()` to get my credentials. With some quick debugging I confirmed that is in fact the case, and it has the correct credentials.
   
   After that, retrieving the discovery document fails with a 403. If I force the api client to use a cached copy of the discovery document, the task completes just fine. So the authentication is working when making calls to create the dataflow job. But for some reason the dataflow API's discovery document url doesn't like the request. Note that you can access the discovery document with no auth at all: https://www.googleapis.com/discovery/v1/apis/dataflow/v1b3/rest
   
   ### What you expected to happen
   
   I should be able to use local application default credentials with running airflow locally, to trigger the dataflow operator
   
   ### How to reproduce
   
   Ran the following in a new venv:
   
   ```
   gcloud auth application-default login # successfully authenticated as my user
   
   python3 -m pip install apache-airflow-providers-google apache-airflow-providers-apache-beam
   export AIRFLOW__CORE__LOAD_EXAMPLES=false
   export AIRFLOW_HOME=$(pwd)/airflowhome
   airflow db init
   
   # Copy the python file into the ./airflowhome/dags (below)
   
   airflow tasks test test_dag dataflow_task $(date +%Y-%m-%dT%H:%M:%S)
   ```
   Contents of the python file:
   
   ```python
   from airflow.providers.google.cloud.operators.dataflow import (
       DataflowStartFlexTemplateOperator,
   )
   from datetime import datetime
   from airflow import DAG
   
   
   with DAG(dag_id="test_dag", schedule_interval=None, start_date=datetime.now()) as dag:
   
       dataflow_task = DataflowStartFlexTemplateOperator(
           task_id="dataflow_task",
           body={
               "launchParameter": {
                   "containerSpecGcsPath": "gs://foo/bar/baz.json",
                   "jobName": "test",
                   "environment": {
                       "serviceAccountEmail": "some-user@some-project.iam.gserviceaccount.com",
                   },
               }
           },
           wait_until_finished=True,
           location="us-central1",
           project_id="some-project",
       )
   ```
   
   ### Anything else
   
   The `google-api-python-client` is locked to versions `<2.0`. In newer versions of the client, discovery documents are shipped with the client and the lookup is rarely used. As long as upgrading that dependency doesn't break other things, that might be a simple fix.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21440: Dataflow Operator fails with Application Default credentials

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21440:
URL: https://github.com/apache/airflow/issues/21440#issuecomment-1040545323


   > Is it worth looking into why the discovery document is throwing a 403 with the current authentication logic? It may be impacting google API calls other than just the dataflow discovery document. Or is it possible it's an issue on the dataflow discovery url's side?
   
   Sure. Feel free to look at it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #21440: Dataflow Operator fails with Application Default credentials

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #21440:
URL: https://github.com/apache/airflow/issues/21440#issuecomment-1033162767


   Just a proposa - feel free to attempt the fix. There is an open issue #18371 to upgrade all google clients to >= 2.0 - maybe this is a good chance to become one of the > 1900 contrubutors like you to upgrade it ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21440: Dataflow Operator fails with Application Default credentials

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21440:
URL: https://github.com/apache/airflow/issues/21440#issuecomment-1033162767


   Just a proposa - feel free to attempt the fix. There is an open issue #18371 to upgrade all clients to >= 2.0 - maybe this is a good chance to become one of the > 1900 contrubutors like you to upgrade it ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] pbhuss commented on issue #21440: Dataflow Operator fails with Application Default credentials

Posted by GitBox <gi...@apache.org>.
pbhuss commented on issue #21440:
URL: https://github.com/apache/airflow/issues/21440#issuecomment-1083482032


   https://stackoverflow.com/questions/59858003/using-airflow-with-bigquery-and-cloud-sdk-gives-error-user-must-be-authenticate
   
   This might be related. People noticed that removing the `quota_project_id` from the credentials file appeared to resolve the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21440: Dataflow Operator fails with Application Default credentials

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21440:
URL: https://github.com/apache/airflow/issues/21440#issuecomment-1040545323


   > Is it worth looking into why the discovery document is throwing a 403 with the current authentication logic? It may be impacting google API calls other than just the dataflow discovery document. Or is it possible it's an issue on the dataflow discovery url's side?
   
   Sure. Feel free to look at it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jceresini commented on issue #21440: Dataflow Operator fails with Application Default credentials

Posted by GitBox <gi...@apache.org>.
jceresini commented on issue #21440:
URL: https://github.com/apache/airflow/issues/21440#issuecomment-1035102075


   Is it worth looking into why the discovery document is throwing a 403 with the current authentication logic? It may be impacting google API calls other than just the dataflow discovery document. Or is it possible it's an issue on the dataflow discovery url's side?
   
   Regarding bumping the package to 2.x: It looks like there some comments about changing version requirements of other google clients, but none of them seem to impact just allowing the `google-api-python-client` to use 2.x (as long as its not changed to require 2+). There doesn't appear to be anything that would impact code using the client either. That wouldn't necessarily solve the issue though, since I would be locking it to `google-api-python-client>=1.6.0,<3.0.0` so users may still end up with version `1.x` if they have other packages with more restrictive dependencies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org