You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/03/27 23:54:10 UTC

[GitHub] [airflow] dimberman opened a new issue #7937: MySQL backend connection management bug (related to PythonOperator)

dimberman opened a new issue #7937: MySQL backend connection management bug (related to PythonOperator)
URL: https://github.com/apache/airflow/issues/7937
 
 
   
   
   **Apache Airflow version**: null
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:
   - **OS** (e.g. from /etc/os-release):
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**:
   **What happened**:
   
   Environment setup
   We are running Airflow 1.7.1.2 with MySQL 5.6. The `wait_timeout` of MySQL is set at 300 seconds, which means that idle connections will go away after 300 seconds of inactivity. To reflect this, we set SQLAlchemy's `pool_recycle` in `airflow.cfg` to 290 seconds, which should force Airflow/SQLAlchemy to recycle/discard connections after 290 seconds. Thus Airflow shouldn't try to use an already-dead connection.  
   	Symptom
   When running a PythonOperator that takes more than 300 seconds to execute, the task would finish executing the Python callable, but ends up with error: https://gist.github.com/garthcn/cd7bcdec12748406506f2b0710655c8b
   It seems that after the Python callable finishes executing, Airflow tries to push its return value to XCom. However, the SQL connection has gone away while Airflow/SQLAlchemy think it's still there. 
   	Hypothesis
   I did some investigation and think that it might be caused by not calling `session.commit()` or `session.close()` for the DB operations before the XCom push. As far as I know, in SQLAlchemy, if you don't close a connection and let it be in `checked-out` state, it won't be recycled by connection pool, and thus SQLAlchemy will try to use it again after >300 seconds (which is the wait_timeout for MySQL in our case). This will result in a "MySQL connection has gone away" issue. 
   
   
   
   It seems that Airflow codebase uses @provide_context decorator to help with session open/close, and my hunch is that some functions are not using it or misusing it.
   
   **What you expected to happen**:
   
   
   **How to reproduce it**:
   
   
   **Anything else we need to know**:
   
   Moved here from https://issues.apache.org/jira/browse/AIRFLOW-405
       

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services