You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/12/02 15:24:01 UTC

[jira] [Commented] (AIRFLOW-1665) Airflow webserver/scheduler don't handle database disconnects (mysql)

    [ https://issues.apache.org/jira/browse/AIRFLOW-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275604#comment-16275604 ] 

ASF subversion and git services commented on AIRFLOW-1665:
----------------------------------------------------------

Commit 94deac34eca869a0accbc6affe7640b09dab1530 in incubator-airflow's branch refs/heads/master from [~StephanErb]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=94deac3 ]

[AIRFLOW-1665] Reconnect on database errors

This change enables the scheduler to recover from temporary database
errors and downtimes. The same holds true for the webserver if run
without its regular worker refresh.

The reconnect logic is based on a truncated exponential binary backoff
to ensure reconnect attempts don't overload the database.

Included changes:

* Switch to recommended pessimistic disconnect handling for engines
  http://docs.sqlalchemy.org/en/rel_1_1/core/pooling.html#disconnect-handling-pessimistic
* Remove legacy pool-based disconnect handling.
* Ensure event handlers are registered for each newly created engine.
  Engines are re-initialized in child processes so this is crucial for
  correctness.

This commit is based on a contribution by @vklogin
https://github.com/apache/incubator-airflow/pull/2744


> Airflow webserver/scheduler don't handle database disconnects (mysql)
> ---------------------------------------------------------------------
>
>                 Key: AIRFLOW-1665
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1665
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: Airflow 1.8
>            Reporter: Vasanth Kumar
>            Assignee: Vasanth Kumar
>              Labels: database, reconnect
>             Fix For: 1.9.1
>
>
> Airflow webserver & scheduler don't handle database disconnects.  The process appear to error out and either exit or are left in an off state.  This was observed when using mysql.
> I don't see any database reconnect configuration or code.
> Stack tace for scheduler:
>   File "...../MySQLdb/connections.py", line 204, in __init__
>     super(Connection, self).__init__(*args, **kwargs2)
> sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2002, "Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)")



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)