You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by "Shaw, Damian P. " <da...@credit-suisse.com> on 2020/02/24 16:17:30 UTC

Airflow Worker settings for retrying to connect to Metadata DB?

Hi all,

Is there a way to get the Airflow Worker (started by Celery) to retry connecting to the Metadata DB by default when it times out?
And if it's related what does setting the worker precheck do when set to True? Will it retry connections if fails?


Details of the issue:

I'm currently on Airflow 1.10.6 using CeleryExecutor with Redis and MySQL DB, recently been getting a few tasks failing before they start. Airflow sends out an email that says:
Executor reports task instance finished (failed) although the task says its queued. Was the task killed externally?

Digging in to the Airflow Worker Stderr I see the exception:
[2020-02-24 06:08:14,718: INFO/ForkPoolWorker-9] Executing command in Celery: ['airflow', 'run', 'my_dag_id', 'my_task_id', '2020-02-23T10:00:00+00:00', '--local', '--pool', 'default_pool', '-sd', '.../dag_creator.py']
[2020-02-24 06:08:26,986: ERROR/ForkPoolWorker-9] execute_command encountered a CalledProcessError
Traceback (most recent call last):
  File ".../lib/python3.7/site-packages/airflow/executors/celery_executor.py", line 67, in execute_command
    close_fds=True, env=env)
  File ".../lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['airflow', 'run', 'my_dag_id', 'my_task_id', '2020-02-23T10:00:00+00:00', '--local', '--pool', 'default_pool', '-sd', '.../dag_creator.py']' returned non-zero exit status 1.


And digging in to the Airflow Worker Stdout at this time I see a disconnection to the Metadata DB:
[2020-02-24 06:08:16,620] {cli.py:545} INFO - Running <TaskInstance: my_dag_id.my_task_id 2020-02-23T10:00:00+00:00 [queued]> on host my_app_host.my.internal_domain.net
Traceback (most recent call last):
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/pymysql/connections.py", line 583, in connect
    **kwargs)
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/socket.py", line 727, in create_connection
    raise err
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2276, in _wrap_pool_connect
    return fn()
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 363, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 760, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 492, in checkout
    rec = pool._do_get()
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/impl.py", line 238, in _do_get
    return self._create_connection()
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 308, in _create_connection
    return _ConnectionRecord(self)
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 437, in __init__
    self.__connect(first_connect_check=True)
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/pool/base.py", line 639, in __connect
    connection = pool._invoke_creator(self)
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 482, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/pymysql/__init__.py", line 94, in Connect
    return Connection(*args, **kwargs)
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/pymysql/connections.py", line 325, in __init__
    self.connect()
  File "/home/qthft/.conda/envs/qt_data_airflow_106/lib/python3.7/site-packages/pymysql/connections.py", line 630, in connect
    raise exc
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'my_db_host.my.internal_domain.net' (timed out)")

Any help is appreciated.

Regards
Damian


=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
===============================================================================