You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/04 16:19:28 UTC

[GitHub] [airflow] coolacid opened a new issue #16271: Celery Executor Dangling MySQL Connection

coolacid opened a new issue #16271:
URL: https://github.com/apache/airflow/issues/16271


   **Apache Airflow version**: apache/airflow:2.0.2-python3.6
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): None
   
   **Environment**: Docker
   
   - **Cloud provider or hardware configuration**: Desktop machine, docker instance
   - **OS** (e.g. from /etc/os-release): 
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**:
   
   **What happened**:
   
   Dangling database connection within the celery executor. 
   
   ```
   mysql_1      | 2021-06-04T16:02:07.342024Z 3016 [Note] Got an error reading communication packets
   ```
   and 
   
   ```
   mysql_1      | 2021-06-04T02:20:19.124785Z 487 [Note] Aborted connection 487 to db: 'airflow' user: 'root' host: '10.10.4.6' (Got an error reading communication packets)
   ```
   
   **What you expected to happen**:
   
   All connections closed. 
   
   **How to reproduce it**:
   
   Very simple Docker-compose setup, monitor mysql log, restart the celery worker, warning/error displays in MySQL logs. 
   
   **Anything else we need to know**:
   
   First, I added a debug message (`log.debug(engine.pool.status())`) to the `dispose_orm()` function in airflow.settings within the `if engine` block, this shows us the state of pools during the atexit ORM disposal. The result of this shows the following:
   
   ```
   af_worker    | [2021-06-04 15:42:03,284: DEBUG/MainProcess] Disposing DB connection pool (PID 7)
   af_worker    | [2021-06-04 15:42:03,285: WARNING/MainProcess] Pool size: 5  Connections in pool: 1 Current Overflow: -4 Current Checked out connections: 0
   ```
   This shows that after the `Session.remove()` call, there still exists one connection in the pool. Calling `engine.dopose()` does not release this connection since it's currently "active", it just unlinks it from the engine pool. 
   
   To investigate further, we can look at the active process list within MySQL, here we see one connection still coming from our worker docker container: (Note, removed other connections not related)
   
   ```
   mysql> show processlist;
   +------+------+-----------------+---------+---------+------+----------+------------------+
   | Id   | User | Host            | db      | Command | Time | State    | Info             |
   +------+------+-----------------+---------+---------+------+----------+------------------+
   | 3017 | root | 10.10.4.6:40242 | airflow | Sleep   |   12 |          | NULL             |
   ```
   
   We can see what queries are run by any connection by enabling the `general_log` in mysql with `SET GLOBAL general_log = 'ON';`
   
   This yields the following query: 
   
   ```
   2021-06-04T16:02:09.215732Z      3017 Connect   root@10.10.4.6 on airflow using TCP/IP
   2021-06-04T16:02:09.215907Z      3017 Query     set autocommit=0
   2021-06-04T16:02:09.216057Z      3017 Query     SET NAMES utf8mb4
   2021-06-04T16:02:09.216276Z      3017 Query     SHOW VARIABLES LIKE 'sql_mode'
   2021-06-04T16:02:09.217518Z      3017 Query     SHOW VARIABLES LIKE 'lower_case_table_names'
   2021-06-04T16:02:09.218483Z      3017 Query     SELECT VERSION()
   2021-06-04T16:02:09.218672Z      3017 Query     SELECT DATABASE()
   2021-06-04T16:02:09.218833Z      3017 Query     SELECT @@transaction_isolation
   2021-06-04T16:02:09.219010Z      3017 Query     show collation where `Charset` = 'utf8mb4' and `Collation` = 'utf8mb4_bin'
   2021-06-04T16:02:09.220013Z      3017 Query     SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
   2021-06-04T16:02:09.220295Z      3017 Query     SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
   2021-06-04T16:02:09.220652Z      3017 Query     SELECT CAST('test collated returns' AS CHAR CHARACTER SET utf8mb4) COLLATE utf8mb4_bin AS anon_1
   2021-06-04T16:02:09.220778Z      3017 Query     rollback
   2021-06-04T16:02:09.220941Z      3017 Query     SET NAMES utf8mb4
   2021-06-04T16:02:09.221058Z      3017 Query     SET time_zone = '+00:00'
   2021-06-04T16:02:09.222799Z      3017 Query     INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES ('2021-06-04 16:02:09.210482', NULL, NULL, 'cli_worker', NULL, 'airflow', '{\"host_name\": \"469064a5afb7\", \"full_command\": \"[\'/home/airflow/.local/bin/airflow\', \'celery\', \'worker\']\"}')
   2021-06-04T16:02:09.223199Z      3017 Query     commit
   2021-06-04T16:02:09.226052Z      3017 Query     rollback
   ```
   
   This log insert matches with pcaps I previously captured showing the log insert connection without a matching `Quit` MySQL command. 
   
   My suspicion is, the addition of this log is not releasing the DB session. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coolacid commented on issue #16271: Celery Executor Dangling MySQL Connection

Posted by GitBox <gi...@apache.org>.
coolacid commented on issue #16271:
URL: https://github.com/apache/airflow/issues/16271#issuecomment-854864127


   If I remove the `register_pre_exec_callback(default_action_log)` from `utils.cli_action_loggers` I no longer have the dangling open connection. 
   
   ```
   af_worker    | [2021-06-04 16:27:44,517: DEBUG/MainProcess] Disposing DB connection pool (PID 6)
   af_worker    | [2021-06-04 16:27:44,518: WARNING/MainProcess] Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out connections: 0
   ```
   However, I still get `Got an error reading communication packets`.
   
   ```
   mysql_1      | 2021-06-04T16:35:55.204039Z 3213 [Note] Got an error reading communication packets
   mysql_1      | 2021-06-04T16:35:55.264724Z 3214 [Note] Got an error reading communication packets
   ```
   Enabling the query log again, and checking for those connection IDs show no actual queries. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coolacid commented on issue #16271: MySQL Connection Warnings/Errors (Entrypoint, Celery Executor Dangling)

Posted by GitBox <gi...@apache.org>.
coolacid commented on issue #16271:
URL: https://github.com/apache/airflow/issues/16271#issuecomment-854887676


   While using the fork celery method: 
   ```
   mysql_1      | 2021-06-04T17:20:15.267685Z 3441 [Note] Aborted connection 3441 to db: 'airflow' user: 'root' host: '10.10.4.4' (Got an error reading communication packets)
   ```
   Query log for connection:
   ```
   2021-06-04T17:20:15.007816Z      3441 Connect   root@10.10.4.4 on airflow using TCP/IP
   2021-06-04T17:20:15.007974Z      3441 Query     set autocommit=0
   2021-06-04T17:20:15.008137Z      3441 Query     SET NAMES utf8mb4
   2021-06-04T17:20:15.008259Z      3441 Query     SET NAMES utf8mb4
   2021-06-04T17:20:15.008446Z      3441 Query     SET time_zone = '+00:00'
   2021-06-04T17:20:15.009399Z      3441 Query     INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES ('2021-06-04 17:20:15.006164', 'Print-Context', 'print_the_context', 'cli_task_run', '2021-06-04 17:20:13.695935', 'airflow', '{\"host_name\": \"bec9dffdf454\", \"full_command\": \"[\'/home/airflow/.local/bin/airflow\', \'celery\', \'worker\']\"}')
   2021-06-04T17:20:15.009893Z      3441 Query     commit
   2021-06-04T17:20:15.014422Z      3441 Query     rollback
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coolacid commented on issue #16271: Celery Executor Dangling MySQL Connection

Posted by GitBox <gi...@apache.org>.
coolacid commented on issue #16271:
URL: https://github.com/apache/airflow/issues/16271#issuecomment-854878015


   Docker entrypoint uses netcat to check for an open port:
   
   https://github.com/apache/airflow/blob/main/scripts/in_container/prod/entrypoint_prod.sh#L128
   
   This is where the extra error reading communication packets are coming from. This should be changed to a mysql command similar to: 
   
   `echo "SHOW STATUS" | mysql -u $MYSQL_USER -p$MYSQL_PASSWORD -P $MYSQL_PORT -h $MYSQL_HOST -r -N  $MYSQL_DATABASE 1>/dev/null`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coolacid commented on issue #16271: MySQL Connection Warnings/Errors (Entrypoint, Celery Executor Dangling)

Posted by GitBox <gi...@apache.org>.
coolacid commented on issue #16271:
URL: https://github.com/apache/airflow/issues/16271#issuecomment-855956154


   I'm not confident I could correctly get the PR for the connection check - it's what I use in other projects, but pulling out the correct connection details, and knowing the difference between the different database types is a little beyond my shell scripting capabilities. 
   
   I'm still confused on the dangling connection, since the log message happens in a connection context, and has a `session.close()`. There is no reason for this to be happening. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coolacid edited a comment on issue #16271: Celery Executor Dangling MySQL Connection

Posted by GitBox <gi...@apache.org>.
coolacid edited a comment on issue #16271:
URL: https://github.com/apache/airflow/issues/16271#issuecomment-854864127


   If I remove the `register_pre_exec_callback(default_action_log)` from `utils.cli_action_loggers` I no longer have the dangling open connection. 
   
   ```
   af_worker    | [2021-06-04 16:27:44,517: DEBUG/MainProcess] Disposing DB connection pool (PID 6)
   af_worker    | [2021-06-04 16:27:44,518: WARNING/MainProcess] Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out connections: 0
   ```
   However, I still get `Got an error reading communication packets`.
   
   ```
   mysql_1      | 2021-06-04T16:35:55.204039Z 3213 [Note] Got an error reading communication packets
   mysql_1      | 2021-06-04T16:35:55.264724Z 3214 [Note] Got an error reading communication packets
   ```
   Enabling the query log again, and checking for those connection IDs show no actual queries. 
   
   After turning off the webserver, and scheduler (to reduce mysql query logs), I can confirm the worker is making connections, but not actually a full connection: 
   
   ```
   mysql_1      | 2021-06-04T16:48:26.400006Z 3251 [Note] Got an error reading communication packets
   mysql_1      | 2021-06-04T16:48:26.458383Z 3252 [Note] Got an error reading communication packets
   ```
   
   ```
   mysqld, Version: 5.7.34 (MySQL Community Server (GPL)). started with:
   Tcp port: 3306  Unix socket: /var/run/mysqld/mysqld.sock
   Time                 Id Command    Argument
   2021-06-04T16:48:19.979034Z      3250 Quit
   2021-06-04T16:48:27.734718Z      3253 Connect   root@localhost on  using Socket
   2021-06-04T16:48:27.734815Z      3253 Query     select @@version_comment limit 1
   2021-06-04T16:48:31.000005Z      3253 Query     SET GLOBAL general_log = 'OFF'
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #16271: MySQL Connection Warnings/Errors (Entrypoint, Celery Executor Dangling)

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #16271:
URL: https://github.com/apache/airflow/issues/16271#issuecomment-855229979


   Nice analysis. @coolacid  Would you be willing to submit PRs fixing the "communication" broken (seems you have the right fix) ? 
   I am not sure if you already got the right fix for the dangling connection (looking from my phone).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coolacid commented on issue #16271: MySQL Connection Warnings/Errors (Entrypoint, Celery Executor Dangling)

Posted by GitBox <gi...@apache.org>.
coolacid commented on issue #16271:
URL: https://github.com/apache/airflow/issues/16271#issuecomment-854886061


   Work around: Set `CONNECTION_CHECK_MAX_COUNT=0` environment variable. 
   
   There is still a DB connection abort when using the `fork` celery method. This can be worked around by setting: `AIRFLOW__CORE__EXECUTE_TASKS_NEW_PYTHON_INTERPRETER=True`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16271: MySQL Connection Warnings/Errors (Entrypoint, Celery Executor Dangling)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16271:
URL: https://github.com/apache/airflow/issues/16271#issuecomment-855229979


   Nice analysis. Would you be willing to submit PRs fixing the "communication" broken (seems you have the right fix) ? 
   I am not sure if you already got the right fix for the dangling connection (looking from my phone).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] coolacid edited a comment on issue #16271: Celery Executor Dangling MySQL Connection

Posted by GitBox <gi...@apache.org>.
coolacid edited a comment on issue #16271:
URL: https://github.com/apache/airflow/issues/16271#issuecomment-854864127


   If I remove the `register_pre_exec_callback(default_action_log)` from `utils.cli_action_loggers` I no longer have the dangling open connection. 
   
   ```
   af_worker    | [2021-06-04 16:27:44,517: DEBUG/MainProcess] Disposing DB connection pool (PID 6)
   af_worker    | [2021-06-04 16:27:44,518: WARNING/MainProcess] Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out connections: 0
   ```
   However, I still get `Got an error reading communication packets`.
   
   ```
   mysql_1      | 2021-06-04T16:35:55.204039Z 3213 [Note] Got an error reading communication packets
   mysql_1      | 2021-06-04T16:35:55.264724Z 3214 [Note] Got an error reading communication packets
   ```
   Enabling the query log again, and checking for those connection IDs show no actual queries. 
   
   After turning off the webserver, and scheduler (to reduce mysql query logs), I can confirm the worker is making connections, but not actually a full connection: 
   
   ```
   mysql_1      | 2021-06-04T16:48:26.400006Z 3251 [Note] Got an error reading communication packets
   mysql_1      | 2021-06-04T16:48:26.458383Z 3252 [Note] Got an error reading communication packets
   ```
   Here I set the general_log on, and restart the worker. Then turn off the general_log. 
   ```
   mysqld, Version: 5.7.34 (MySQL Community Server (GPL)). started with:
   Tcp port: 3306  Unix socket: /var/run/mysqld/mysqld.sock
   Time                 Id Command    Argument
   2021-06-04T16:48:19.979034Z      3250 Quit
   2021-06-04T16:48:27.734718Z      3253 Connect   root@localhost on  using Socket
   2021-06-04T16:48:27.734815Z      3253 Query     select @@version_comment limit 1
   2021-06-04T16:48:31.000005Z      3253 Query     SET GLOBAL general_log = 'OFF'
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org