You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/01 05:45:39 UTC
[GitHub] [airflow] jtmiclat opened a new issue #8659: Setting num_retries for a google-cloud-platform using env variables breaks break running of google tasks
jtmiclat opened a new issue #8659:
URL: https://github.com/apache/airflow/issues/8659
**Apache Airflow version**: 1.10.10
**Environment**:
- **Cloud provider or hardware configuration**:
- **OS** (e.g. from /etc/os-release): Pop!\_OS 18.04 LTS"l
- **Kernel** (e.g. `uname -a`): Linux pop-os 5.3.0-7648-generic
- **Install tools**: `pyenv` and `pyenv-virtualenv` for installing python and virtualenv
- **Others**:
**What happened**:
I created a task using the google_cloud_plaform using environment variables to set the connection
```
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT="google-cloud-platform://?extra__google_cloud_platform__key_path=test.json&extra__google_cloud_platform__project=test&extra__google_cloud_platform__num_retries=3"
export AIRFLOW_HOME=$(pwd)
```
and received the following error when running a `BigQueryOperator`
```
headers=self.headers,
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/googleapiclient/http.py", line 159, in _retry_request
for retry_num in range(num_retries + 1):
TypeError: can only concatenate str (not "int") to str
```
**What you expected to happen**:
It should not return an error
**How to reproduce it**:
Create test folder
```
mkdir test-airflow
cd test-airflow
```
Install airflow with gcp
```
pip install apache-airflow[gcp]==1.10.10
```
set the following as env variable
```
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT="google-cloud-platform://?extra__google_cloud_platform__key_path=test.json&extra__google_cloud_platform__project=test&extra__google_cloud_platform__num_retries=3"
export AIRFLOW_HOME=$(pwd)
```
Initialize db
```
airflow initdb
```
Create a google credentials and move it to `test-airflow/test.json`. Code doesn't reach error state unless `test.json` has a valid google credentials structure.
Create a file containing the following in the dags folder
```
from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator
from airflow.utils import timezone
default_args = {
"start_date": timezone.datetime(2020, 5, 1),
}
dag = DAG(
"test.dag",
default_args=default_args,
schedule_interval=None,
)
BigQueryOperator(
dag=dag,
task_id="test.task",
sql="Select * from test.test",
destination_dataset_table="test.result",
write_disposition="WRITE_TRUNCATE",
bigquery_conn_id="google_cloud_default",
use_legacy_sql=False,
)
```
Run the task
```
airflow test test.dag test.task 2020-05-01
```
Running the command generates the following traceback
<details><summary>Traceback</summary>
Traceback (most recent call last):
File "/home/jt/.pyenv/versions/airflow-example/bin/airflow", line 37, in <module>
args.func(args)
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/utils/cli.py", line 75, in wrapper
return f(*args, **kwargs)
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/bin/cli.py", line 682, in test
ti.run(ignore_task_deps=True, ignore_ti_state=True, test_mode=True)
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/utils/db.py", line 74, in wrapper
return func(*args, **kwargs)
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1102, in run
session=session)
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/utils/db.py", line 70, in wrapper
return func(*args, **kwargs)
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/contrib/operators/bigquery_operator.py", line 279, in execute
encryption_configuration=self.encryption_configuration
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 910, in run_query
return self.run_with_configuration(configuration)
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/airflow/contrib/hooks/bigquery_hook.py", line 1318, in run_with_configuration
.execute(num_retries=self.num_retries)
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/googleapiclient/http.py", line 901, in execute
headers=self.headers,
File "/home/jt/.pyenv/versions/3.7.7/envs/airflow-example/lib/python3.7/site-packages/googleapiclient/http.py", line 159, in _retry_request
for retry_num in range(num_retries + 1):
TypeError: can only concatenate str (not "int") to str
Traceback </details>
**Anything else we need to know**:
I've determined the bug happens when setting num_retries for a google_cloud_platform connection using an environment variable. The code expects it to be an int but the hook's `num_retries` property is a string.
```
>>> from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook
>>> GoogleCloudBaseHook().num_retries
[2020-05-01 02:21:03,325] {crypto.py:85} WARNING - empty cryptography key - values will not be stored encrypted.
'3'
>>> type(GoogleCloudBaseHook().num_retries)
<class 'str'>
```
Removing `num_retries` form the env var is a quick fix to the problem but disables configuration using env variabes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] mik-laj commented on issue #8659: Setting num_retries for a google-cloud-platform using env variables breaks google tasks
Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8659:
URL: https://github.com/apache/airflow/issues/8659#issuecomment-623213202
@michalslowikowski00 Can you help with it?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] michalslowikowski00 commented on issue #8659: Setting num_retries for a google-cloud-platform using env variables breaks google tasks
Posted by GitBox <gi...@apache.org>.
michalslowikowski00 commented on issue #8659:
URL: https://github.com/apache/airflow/issues/8659#issuecomment-623309946
@mik-laj I know how to fix it. I didn't have time for it. :(
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #8659: Setting num_retries for a google-cloud-platform using env variables breaks break running of google tasks
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #8659:
URL: https://github.com/apache/airflow/issues/8659#issuecomment-622259238
Thanks for opening your first issue here! Be sure to follow the issue template!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org