You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/03 13:32:48 UTC
[GitHub] [airflow] rodrigoechaide commented on issue #11379: Temporary failure in name resolution while running tasks using KubernetesExecutor
rodrigoechaide commented on issue #11379:
URL: https://github.com/apache/airflow/issues/11379#issuecomment-891850328
Hi, @Siddharthk are you still facing the same issue? Because I am facing the same issue when running a DAG that has 500 parallel tasks because I am running some stress tests on airflow. In the DAG task, I have an iterator parameter, and by changing it I can modify the duration of each task. It does not matter how long the task lasts, I have had the issue with tasks that last from seconds to more than 20 minutes. I am using KubernetesExecutor and when fetching the pods using kubectl I am getting this:
```
k get pods -n airflow | grep Error
performancetest500tasksinparallel20taskperformancetest500tasksd.04c772dbda6c47b79b017c90b73055af 0/1 Error 0 8m27s
performancetest500tasksinparallel20taskperformancetest500tasksd.05269b668c8043c7b7ac32c0e06ce2bc 0/1 Error 0 6m31s
performancetest500tasksinparallel20taskperformancetest500tasksd.0819b03e3fda475abfd3893dc7598ffb 0/1 Error 0 8m56s
performancetest500tasksinparallel20taskperformancetest500tasksd.09f9fa1367194deabead2b7d6de72c83 0/1 Error 0 8m2s
performancetest500tasksinparallel20taskperformancetest500tasksd.0c61b81e7dfc4d17846c89d78eefac0c 0/1 Error 0 5m59s
performancetest500tasksinparallel20taskperformancetest500tasksd.0d0b39ea912a48c898d13b5392c0ee7e 0/1 Error 0 8m41s
performancetest500tasksinparallel20taskperformancetest500tasksd.0d1e17539b934616a0f72a05b530d88e 0/1 Error 0 8m33s
performancetest500tasksinparallel20taskperformancetest500tasksd.12e3fd2a030340589e251c987652c61e 0/1 Error 0 9m16s
performancetest500tasksinparallel20taskperformancetest500tasksd.1312a64638e34ee488d5f8839a29c0e6 0/1 Error 0 7m25s
performancetest500tasksinparallel20taskperformancetest500tasksd.1508cf02371d4dff8c925a3855a60911 0/1 Error 0 7m31s
performancetest500tasksinparallel20taskperformancetest500tasksd.1d3c9140a24e42c29fe5def938832759 0/1 Error 0 7m17s
performancetest500tasksinparallel20taskperformancetest500tasksd.1e5cee28a93b4f62bc1c06d1bb6ed785 0/1 Error 0 8m30s
performancetest500tasksinparallel20taskperformancetest500tasksd.214e5df400c24764b9104e5e324dc314 0/1 Error 0 8m55s
performancetest500tasksinparallel20taskperformancetest500tasksd.272b9e6502ce49078c68741731aa8144 0/1 Error 0 7m39s
performancetest500tasksinparallel20taskperformancetest500tasksd.2840867f20a34a4fae6ad71ff1ef2803 0/1 Error 0 6m3s
performancetest500tasksinparallel20taskperformancetest500tasksd.2aca869d190d4a17a60653788d73e090 0/1 Error 0 7m22s
performancetest500tasksinparallel20taskperformancetest500tasksd.2d6f588cba464f2c9aec0f75eff105a5 0/1 Error 0 6m32s
performancetest500tasksinparallel20taskperformancetest500tasksd.31513adf9a4d4faa910b8eeedf53b960 0/1 Error 0 8m48s
performancetest500tasksinparallel20taskperformancetest500tasksd.3600857bd1784617b4322ec304924870 0/1 Error 0 8m58s
performancetest500tasksinparallel20taskperformancetest500tasksd.3659ef7cbcb345e99ba557e6ca6b881d 0/1 Error 0 9m1s
```
And when checking the logs of one of the tasks I am getting this error:
```
k logs performancetest500tasksinparallel20taskperformancetest500tasksd.6426f08f727c4f15b2c041ce98f163d5 -n airflow
[2021-08-03 12:44:59,468] {cli_action_loggers.py:105} WARNING - Failed to log action with (psycopg2.OperationalError) could not translate host name "qa-airflow.carnijjbfa3r.eu-west-1.rds.amazonaws.com" to address: Temporary failure in name resolution
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-08-03 12:44:59,469] {dagbag.py:496} INFO - Filling up the DagBag from /opt/airflow/dags/git/performance_test_500_tasks_in_parallel_2_0.py
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
return fn()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 364, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
rec = pool._do_get()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 241, in _do_get
return self._create_connection()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
self.__connect(first_connect_check=True)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
pool.logger.debug("Error on connect(): %s", e)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.raise_(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
connection = pool._invoke_creator(self)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 508, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: could not translate host name "qa-airflow.carnijjbfa3r.eu-west-1.rds.amazonaws.com" to address: Temporary failure in name resolution
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/airflow/__main__.py", line 40, in main
args.func(args)
File "/usr/local/lib/python3.9/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/airflow/utils/cli.py", line 91, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py", line 227, in task_run
ti.refresh_from_db()
File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 625, in refresh_from_db
ti = qry.first()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3429, in first
ret = list(self[0:1])
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
return list(res)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3556, in _execute_and_instances
conn = self._get_bind_args(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3571, in _get_bind_args
return fn(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 3550, in _connection_from_session
conn = self.session.connection(**kw)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1142, in connection
return self._connection_for_bind(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1150, in _connection_for_bind
return self.transaction._connection_for_bind(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 433, in _connection_for_bind
conn = bind._contextual_connect()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2302, in _contextual_connect
self._wrap_pool_connect(self.pool.connect, None),
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2339, in _wrap_pool_connect
Connection._handle_dbapi_exception_noconnection(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1583, in _handle_dbapi_exception_noconnection
util.raise_(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2336, in _wrap_pool_connect
return fn()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 364, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 495, in checkout
rec = pool._do_get()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/impl.py", line 241, in _do_get
return self._create_connection()
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 440, in __init__
self.__connect(first_connect_check=True)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
pool.logger.debug("Error on connect(): %s", e)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
compat.raise_(
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/pool/base.py", line 656, in __connect
connection = pool._invoke_creator(self)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 508, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "qa-airflow.carnijjbfa3r.eu-west-1.rds.amazonaws.com" to address: Temporary failure in name resolution
```
These are some of the configuration variables of my airflow cluster:
```
AIRFLOW_HOME: "/opt/airflow"
AIRFLOW__CORE__DAGS_FOLDER: "/opt/airflow/dags/git"
AIRFLOW__LOGGING__BASE_LOG_FOLDER: "/opt/airflow/logs"
AIRFLOW__LOGGING__LOGGING_LEVEL: "INFO" # DEBUG, INFO, WARNING, ERROR or CRITICAL.
AIRFLOW__LOGGING__FAB_LOGGING_LEVEL: "WARNING"
AIRFLOW__LOGGING__LOG_FILENAME_TEMPLATE: "{{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log"
AIRFLOW__LOGGING__LOG_FORMAT: "%(message)s"
AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: "60"
AIRFLOW__CORE__DAG_CONCURRENCY: "500"
AIRFLOW__CORE__PARALLELISM: "500"
AIRFLOW__CORE__SQL_ALCHEMY_POOL_SIZE: "0"
AIRFLOW__CORE__EXECUTOR: "KubernetesExecutor"
AIRFLOW__API__AUTH_BACKEND: "airflow.api.auth.backend.default"
AIRFLOW__CORE__LOAD_EXAMPLES: "False"
AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: "1.1"
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "True"
AIRFLOW__KUBERNETES__NAMESPACE: "airflow"
AIRFLOW__KUBERNETES__WORKER_PODS_CREATION_BATCH_SIZE: "1"
AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE: "/opt/airflow/template.yaml"
AIRFLOW__SCHEDULER__SCHEDULE_AFTER_TASK_EXECUTION: "False"
```
And besides that config, I have set up the `default_pool` size of 500 slots in order to be able to run 500 parallel tasks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org