You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/01 05:45:39 UTC

[GitHub] [airflow] jtmiclat opened a new issue #8659: Setting num_retries for a google-cloud-platform using env variables breaks break running of google tasks

jtmiclat opened a new issue #8659:
URL: https://github.com/apache/airflow/issues/8659


   **Apache Airflow version**: 1.10.10
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:
   - **OS** (e.g. from /etc/os-release): Pop!\_OS 18.04 LTS"l
   - **Kernel** (e.g. `uname -a`): Linux pop-os 5.3.0-7648-generic
   - **Install tools**: `pyenv` and `pyenv-virtualenv` for installing  python and virtualenv
   - **Others**:
   
   **What happened**:
   I created a task using the google_cloud_plaform using environment variables to set the connection 
   ```
   export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT="google-cloud-platform://?extra__google_cloud_platform__key_path=test.json&extra__google_cloud_platform__project=test&extra__google_cloud_platform__num_retries=3"
   export AIRFLOW_HOME=$(pwd)
   ```
   and received the following error when running a `BigQueryOperator`
   
   ```
    headers=self.headers,
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/googleapiclient/http.py", line 159, in _retry_request
       for retry_num in range(num_retries + 1):
   TypeError: can only concatenate str (not "int") to str
   ```
   
   **What you expected to happen**:
   
   It should not return an error
   
   **How to reproduce it**:
   Create test folder
   
   ```
   mkdir test-airflow
   cd test-airflow
   ```
   
   Install airflow with gcp
   
   ```
   pip install apache-airflow[gcp]==1.10.10
   ```
   
   set the following as env variable
   
   ```
   export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT="google-cloud-platform://?extra__google_cloud_platform__key_path=test.json&extra__google_cloud_platform__project=test&extra__google_cloud_platform__num_retries=3"
   export AIRFLOW_HOME=$(pwd)
   ```
   
   Initialize db
   
   ```
   airflow initdb
   ```
   
   Create a google credentials and move it to `test-airflow/test.json`. Code doesn't reach error state unless `test.json` has a valid google credentials structure.
   
   Create a file containing the following in the dags folder
   
   ```
   from airflow import DAG
   from  airflow.contrib.operators.bigquery_operator import BigQueryOperator
   from airflow.utils import timezone
   
   default_args = {
       "start_date": timezone.datetime(2020, 5, 1),
   }
   dag = DAG(
       "test.dag",
       default_args=default_args,
       schedule_interval=None,
   )
   BigQueryOperator(
           dag=dag,
           task_id="test.task",
           sql="Select * from test.test",
           destination_dataset_table="test.result",
           write_disposition="WRITE_TRUNCATE",
           bigquery_conn_id="google_cloud_default",
           use_legacy_sql=False,
   )
   ```
   
   Run the task
   
   ```
   airflow test test.dag test.task 2020-05-01
   ```
   Running the command generates the following traceback
   
   <details><summary>Traceback</summary> 
   Traceback (most recent call last):
     File "/home/jt/.pyenv/versions/airflow-example/bin/airflow", line 37, in <module>
       args.func(args)
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/utils/cli.py", line 75, in wrapper
       return f(*args, **kwargs)
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/bin/cli.py", line 682, in test
       ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
       return func(*args, **kwargs)
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1102, in run
       session=session)
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/utils/db.py", line 70, in wrapper
       return func(*args, **kwargs)
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
       result = task_copy.execute(context=context)
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/contrib/operators/bigquery_operator.py", line 279, in execute
       encryption_configuration=self.encryption_configuration
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 910, in run_query
       return self.run_with_configuration(configuration)
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 1318, in run_with_configuration
       .execute(num_retries=self.num_retries)
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
       return wrapped(*args, **kwargs)
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/googleapiclient/http.py", line 901, in execute
       headers=self.headers,
     File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/googleapiclient/http.py", line 159, in _retry_request
       for retry_num in range(num_retries + 1):
   TypeError: can only concatenate str (not "int") to str
   Traceback </details>
   
   **Anything else we need to know**:
   I've determined the bug happens when setting num_retries for a google_cloud_platform connection using an environment variable. The code expects it to be an int but the hook's `num_retries` property is a string. 
   
   ```
   >>> from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
   >>> GoogleCloudBaseHook().num_retries
   [2020-05-01 02:21:03,325] {crypto.py:85} WARNING - empty cryptography key - values will not be stored encrypted.
   '3'
   >>> type(GoogleCloudBaseHook().num_retries)
   <class 'str'>
   ```
   Removing `num_retries` form the env var is a quick fix to the problem but disables configuration using env variabes. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8659: Setting num_retries for a google-cloud-platform using env variables breaks google tasks

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8659:
URL: https://github.com/apache/airflow/issues/8659#issuecomment-623213202


   @michalslowikowski00 Can you help with it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] michalslowikowski00 commented on issue #8659: Setting num_retries for a google-cloud-platform using env variables breaks google tasks

Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on issue #8659:
URL: https://github.com/apache/airflow/issues/8659#issuecomment-623309946


   @mik-laj  I know how to fix it. I didn't have time for it. :(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #8659: Setting num_retries for a google-cloud-platform using env variables breaks break running of google tasks

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #8659:
URL: https://github.com/apache/airflow/issues/8659#issuecomment-622259238


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org