You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/30 12:56:57 UTC

[GitHub] [airflow] shivanshs9 opened a new issue #11899: Scheduler deadlock with max_threads > 1

shivanshs9 opened a new issue #11899:
URL: https://github.com/apache/airflow/issues/11899


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   These questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**: v2.0.0a2
   
   **Environment**:
   
   - **Database**: MariaDB
   
   **What happened**:
   
   <!-- (please include exact error messages if you can) -->
   
   Scheduler main process crashed repeatedly (observed 7 crashes in just 4 minutes). The crashes were observed to happen only if `max_threads` option in the scheduler section is set greater than 1 (in this case, 2) with `use_row_level_locking = True`. Setting either `max_threads = 1` or `use_row_level_locking = False` fixed the issue, but are more of an hack.
   
   **What you expected to happen**:
   
   Scheduler process to run normally.
   
   <!-- What do you think went wrong? -->
   
   **How to reproduce it**:
   <!---
   
   As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags.
   
   If you are using kubernetes, please attempt to recreate the issue using minikube or kind.
   
   ## Install minikube/kind
   
   - Minikube https://minikube.sigs.k8s.io/docs/start/
   - Kind https://kind.sigs.k8s.io/docs/user/quick-start/
   
   If this is a UI bug, please provide a screenshot of the bug or a link to a youtube video of the bug in action
   
   You can include images using the .md style of
   ![alt text](http://url/to/img.png)
   
   To record a screencast, mac users can use QuickTime and then create an unlisted youtube video with the resulting .mov file.
   
   --->
   
   
   **Anything else we need to know**:
   
   <!--
   
   How often does this problem occur? Once? Every time etc?
   
   Any relevant logs to include? Put them here in side a detail tag:
   <details><summary>x.log</summary> lots of stuff </details>
   
   -->
   
   <details>
   <summary>Scheduler logs</summary>
   
   ```
   [2020-10-26 09:03:54,608] {{settings.py:49}} INFO - Configured default timezone Timezone('UTC')
   [2020-10-26 09:04:05,467] {{scheduler_job.py:1327}} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
       self.dialect.do_execute(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
       cursor.execute(statement, parameters)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 255, in execute
       self.errorhandler(self, exc, value)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
       raise errorvalue
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 252, in execute
       res = self._query(query)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 379, in _query
       self._do_get_result(db)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 182, in _do_get_result
       self._result = result = self._get_result()
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 411, in _get_result
       return self._get_db().store_result()
   _mysql_exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1308, in _execute
       self._run_scheduler_loop()
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1379, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1451, in _do_scheduling
       self._create_dag_runs(query.all(), session)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3341, in all
       return list(self)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3503, in __iter__
       return self._execute_and_instances(context)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3528, in _execute_and_instances
       result = conn.execute(querycontext.statement, self._params)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1014, in execute
       return meth(self, multiparams, params)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1127, in _execute_clauseelement
       ret = self._execute_context(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
       self._handle_dbapi_exception(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
       util.raise_(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
       raise exception
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
       self.dialect.do_execute(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
       cursor.execute(statement, parameters)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 255, in execute
       self.errorhandler(self, exc, value)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
       raise errorvalue
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 252, in execute
       res = self._query(query)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 379, in _query
       self._do_get_result(db)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 182, in _do_get_result
       self._result = result = self._get_result()
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 411, in _get_result
       return self._get_db().store_result()
   sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
   [SQL: SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag.concurrency AS dag_concurrency, dag.has_task_concurrency_limits AS dag_has_task_concurrency_limits, dag.next_dagrun AS dag_next_dagrun, dag.next_dagrun_create_after AS dag_next_dagrun_create_after
   FROM dag
   WHERE dag.is_paused IS false AND dag.is_active IS true AND dag.next_dagrun_create_after <= now() ORDER BY dag.next_dagrun_create_after
    LIMIT %s FOR UPDATE]
   [parameters: (10,)]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2020-10-26 09:04:06,512] {{process_utils.py:102}} INFO - Sending Signals.SIGTERM to GPID 7437
   [2020-10-26 09:04:07,029] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7762, status='terminated', started='09:04:05') (7762) terminated with exit code None
   [2020-10-26 09:04:07,122] {{process_utils.py:219}} INFO - Waiting up to 5 seconds for processes to exit...
   [2020-10-26 09:04:07,126] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7774, status='terminated', started='09:04:05') (7774) terminated with exit code None
   [2020-10-26 09:04:07,128] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7437, status='terminated', exitcode=0, started='09:03:54') (7437) terminated with exit code 0
   [2020-10-26 09:04:07,128] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7775, status='terminated', started='09:04:05') (7775) terminated with exit code None
   [2020-10-26 09:04:07,129] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7773, status='terminated', started='09:04:05') (7773) terminated with exit code None
   [2020-10-26 09:04:07,129] {{process_utils.py:68}} INFO - Process psutil.Process(pid=7782, status='terminated', started='09:04:05') (7782) terminated with exit code None
   ```
   </detail>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718653453


   > > or this is just a case of "something else has my lock I should wait/retry".
   > 
   > @ashb in that case, shouldn't it throw lock wait timeout exceeded? thinking
   
   That would have been nice.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #11899: [MariaDB] Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-771722612


   I was not able to reproduce this @shivanshs9 can you provide reproduction steps if this is still occurring for you


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-717590816


   You know what bugs me most about this? That mysql/MariaDB calls this a deadlock but it isn't. Not formally. They need two (or more) locks taken out in conflicting orders.
   
   Here we just have incompatible locks they would resolve fine if it waited a few milliseconds.
   
   Rant over.
   
   I'll take a look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718639140


   > Btw, it's indeed difficult to find a versioned image for Airflow 2. 
   
   Maybe, because we have not released one. I will add an issue to automate image building on tagging the next image. It should be very easy. I might even add it today.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-736521240


   @ashb 
   There's another deadlock case with different Traceback:
   ```
   [2020-12-01 12:23:36,996] {{scheduler_job.py:1301}} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
       self.dialect.do_execute(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
       cursor.execute(statement, parameters)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 255, in execute
       self.errorhandler(self, exc, value)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
       raise errorvalue
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 252, in execute
       res = self._query(query)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 379, in _query
       self._do_get_result(db)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 182, in _do_get_result
       self._result = result = self._get_result()
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 411, in _get_result
       return self._get_db().store_result()
   _mysql_exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1283, in _execute
       self._run_scheduler_loop()
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1385, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1484, in _do_scheduling
       self._create_dag_runs(query.all(), session)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3341, in all
       return list(self)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3503, in __iter__
       return self._execute_and_instances(context)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3528, in _execute_and_instances
       result = conn.execute(querycontext.statement, self._params)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1014, in execute
       return meth(self, multiparams, params)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1127, in _execute_clauseelement
       ret = self._execute_context(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
       self._handle_dbapi_exception(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
       util.raise_(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
       raise exception
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
       self.dialect.do_execute(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
       cursor.execute(statement, parameters)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 255, in execute
       self.errorhandler(self, exc, value)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
       raise errorvalue
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 252, in execute
       res = self._query(query)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 379, in _query
       self._do_get_result(db)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 182, in _do_get_result
       self._result = result = self._get_result()
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 411, in _get_result
       return self._get_db().store_result()
   sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
   [SQL: SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag.concurrency AS dag_concurrency, dag.has_task_concurrency_limits AS dag_has_task_concurrency_limits, dag.next_dagrun AS dag_next_dagrun, dag.next_dagrun_create_after AS dag_next_dagrun_create_after
   FROM dag
   WHERE dag.is_paused IS false AND dag.is_active IS true AND dag.next_dagrun_create_after <= now() ORDER BY dag.next_dagrun_create_after
    LIMIT %s FOR UPDATE]
   [parameters: (10,)]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2020-12-01 12:23:38,016] {{process_utils.py:95}} INFO - Sending Signals.SIGTERM to GPID 36
   [2020-12-01 12:23:38,389] {{process_utils.py:61}} INFO - Process psutil.Process(pid=522, status='terminated', started='12:23:36') (522) terminated with exit code None
   [2020-12-01 12:23:38,405] {{process_utils.py:201}} INFO - Waiting up to 5 seconds for processes to exit...
   [2020-12-01 12:23:38,415] {{process_utils.py:61}} INFO - Process psutil.Process(pid=36, status='terminated', exitcode=0, started='12:22:07') (36) terminated with exit code 0
   [2020-12-01 12:23:38,416] {{process_utils.py:61}} INFO - Process psutil.Process(pid=526, status='terminated', started='12:23:36') (526) terminated with exit code None
   [2020-12-01 12:23:38,416] {{scheduler_job.py:1304}} INFO - Exited execute loop
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-732275267


   @shivanshs9 Could you re-test this with beta3 -- just to confirm if this problem still exists.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #11899: [MariaDB] Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-771722612


   I was not able to reproduce this @shivanshs9 can you provide reproduction steps if this is still occurring for you


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #11899: [MariaDB] Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-752941198


   Changed to 2.0.1 as this seems to be blocking people from upgrading to 2.0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-717560851


   > Setting either `max_threads = 1`
   
   It seems the issue still occurs even with 1 max_threads, but at a reduced frequency: 5 crashes in an hour.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718606469


   @shivanshs9 Do you use SubDagOperator at all?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718659802


   The `apache/airflow:2.0.0a2-python3.6` is there. Pushing the other two.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-732281749


   https://github.com/apache/airflow/pull/11797 is the PR that _may_ have fixed this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] totalhack commented on issue #11899: [MariaDB] Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
totalhack commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-752768500


   Hey there, I'm running into the same issue and it's preventing me from getting an initial deployment of Airflow done. Same case as [this comment](https://github.com/apache/airflow/issues/11899#issuecomment-736521240).
   
   Are there any known workarounds or an ETA on a fix?
   
   I'm using MySQL 5.7.12, and `airflow version` shows `2.0.0` (using the apache/airflow:2.0.0 docker image).  Setting max_threads=1 did not solve the issue. 
   
   Thanks for all your work, looking forward to the improvements in v2. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718621990


   @ashb 
   
   > @shivanshs9 Which version of MariaDB are you on? And is this a single node, or a Galera cluster?
   
   Engine version is 10.3.8 and it is a single node.
   
   > Something doesn't add up with your stack trace -- you say you are on 2.0.0a2, but Line 1451 from this stack trace
   
   Ah, it seems I'm still using an old base image - [docker.pkg.github.com/apache/airflow/master-python3.8:feda2338a06e0cc2409943794a8cdf2e9a2e2625](https://github.com/apache/airflow/packages/256807?version=feda2338a06e0cc2409943794a8cdf2e9a2e2625). Just confirmed it's `2.0.0a1` (ran `airflow version` in the container). Will try on the latest image.
   Btw, it's indeed difficult to find a versioned image for Airflow 2. :sweat_smile: 
   
   > Do you use SubDagOperator at all?
   
   Nope.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718621333


   So transaction (1) from the innodb status is this one:
   
   https://github.com/apache/airflow/blob/79d71cc5873527eef3661ff1dad134bf0ec4f385/airflow/models/dag.py#L2160-L2180
   
   Transaction (2) is harder to track down exactly, but since it's from the dag parsing process, it'll be 
   
   https://github.com/apache/airflow/blob/79d71cc5873527eef3661ff1dad134bf0ec4f385/airflow/models/dag.py#L2160-L2180
   
   Which was selected with a lock earlier on in the same function https://github.com/apache/airflow/blob/79d71cc5873527eef3661ff1dad134bf0ec4f385/airflow/models/dag.py#L1767-L1772
   
   Can I just re-iterate again how _wrong_ MariaDB is to call this a _DEAD_ lock. A dead lock is when process A holds locks 1, then tries to get lock 2, and process B holds lock 2 then tries to get lock 1. In the DB status logs _there is only one lock involved_ -- so either the logs aren't useful, or this is just a case of "something else has my lock I should wait/retry".


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-735769977


   > Weirdly, I think the process is being terminated (as in the logs) but it's not exactly crashing the enclosing pod. So the container is not being restarted either causing the scheduler to not work indefinitely.
   
   Do you have any sidecars running in the pod?
   
   Either way, this is a separate issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-717550711


   cc: @ashb @kaxil - seems that this can happen quite often.
   
   @shivanshs9. Is it possible that you turn on logging for all db locks and provide your DB server logs?
   
   Here is some info on that:
   
   https://dba.stackexchange.com/questions/87350/view-last-several-innodb-deadlocks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718646034


   >  or this is just a case of "something else has my lock I should wait/retry".
   
   @ashb in that case, shouldn't it throw lock wait timeout exceeded? :thinking: 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-717560284


   @potiuk this was observed in RDS, so difficult to provide all the DB logs. I'll spin up a local MySQL pod and try and replicate the issue with the similar load.
   
   For now, I managed to get the latest deadlock (ran `SHOW ENGINE INNODB STATUS;`):
   ```
   ------------------------
   LATEST DETECTED DEADLOCK
   ------------------------
   2020-10-27 21:40:59 0x2ad0ba744700
   *** (1) TRANSACTION:
   TRANSACTION 4308769990, ACTIVE 0 sec fetching rows
   mysql tables in use 1, locked 1
   LOCK WAIT 4 lock struct(s), heap size 1136, 4 row lock(s)
   MySQL thread id 19920923, OS thread handle 47071625484032, query id [truncated].compute-1.amazonaws.com [truncated] root Sending data
   SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag.concurrency AS dag_concurrency, dag.has_task_concurrency_limits AS dag_has_task_concurrency_limits, dag.next_dagrun AS dag_next_dagrun, dag.next_dagrun_create_after AS dag_next_dagrun_create_after
   FROM dag
   WHERE dag.is_paused IS false AND dag.is_active IS true AND dag.next_dagrun_create_after <= now() ORDER BY dag.next_dagr
   *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
   RECORD LOCKS space id 5563 page no 3 n bits 168 index PRIMARY of table `chronos_airflow`.`dag` trx id 4308769990 lock_mode X locks rec but not gap waiting
   Record lock, heap no 97 PHYSICAL RECORD: n_fields 21; compact format; info bits 0
    0: len 18; hex 6572676f5f6a6f625f636f6c6c6563746f72; asc ergo_job_collector;;
    1: len 6; hex 000100d29cc5; asc       ;;
    2: len 7; hex 420006001209ca; asc B      ;;
    3: len 1; hex 80; asc  ;;
    4: len 1; hex 80; asc  ;;
    5: len 1; hex 81; asc  ;;
    6: SQL NULL;
    7: SQL NULL;
    8: SQL NULL;
    9: SQL NULL;
    10: SQL NULL;
    11: len 29; hex 2f6f70742f616972666c6f772f646167732f6461675f6572676f2e7079; asc /opt/airflow/dags/dag_ergo.py;;
    12: len 7; hex 616972666c6f77; asc airflow;;
    13: SQL NULL;
    14: len 4; hex 74726565; asc tree;;
    15: len 30; hex 7b2274797065223a202274696d6564656c7461222c20226174747273223a; asc {"type": "timedelta", "attrs":; (total 77 bytes);
    16: SQL NULL;
    17: len 7; hex 5f9893e102c57c; asc _     |;;
    18: len 7; hex 5f9893eb02c57c; asc _     |;;
    19: len 4; hex 80000010; asc     ;;
    20: len 1; hex 80; asc  ;;
   
   *** (2) TRANSACTION:
   TRANSACTION 4308769989, ACTIVE 0 sec updating or deleting
   mysql tables in use 2, locked 2
   7 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 1
   MySQL thread id 19927787, OS thread handle 47075969746688, query id [truncated].compute-1.amazonaws.com [truncated] root Updating
   UPDATE dag SET next_dagrun='2020-10-27 21:40:49.181628', next_dagrun_create_after='2020-10-27 21:40:59.181628' WHERE dag.dag_id = 'ergo_job_collector'
   *** (2) HOLDS THE LOCK(S):
   RECORD LOCKS space id 5563 page no 3 n bits 168 index PRIMARY of table `chronos_airflow`.`dag` trx id 4308769989 lock_mode X locks rec but not gap
   Record lock, heap no 95 PHYSICAL RECORD: n_fields 21; compact format; info bits 0
    0: len 16; hex 6572676f5f7461736b5f717565756572; asc ergo_task_queuer;;
    1: len 6; hex 000000000000; asc       ;;
    2: len 7; hex 80000000000000; asc        ;;
    3: len 1; hex 80; asc  ;;
    4: len 1; hex 80; asc  ;;
    5: len 1; hex 81; asc  ;;
    6: SQL NULL;
    7: SQL NULL;
    8: SQL NULL;
    9: SQL NULL;
    10: SQL NULL;
    11: len 29; hex 2f6f70742f616972666c6f772f646167732f6461675f6572676f2e7079; asc /opt/airflow/dags/dag_ergo.py;;
    12: len 7; hex 616972666c6f77; asc airflow;;
    13: SQL NULL;
    14: len 4; hex 74726565; asc tree;;
    15: len 30; hex 7b2274797065223a202274696d6564656c7461222c20226174747273223a; asc {"type": "timedelta", "attrs":; (total 77 bytes);
    16: SQL NULL;
    17: len 7; hex 5f9893e60e6fa0; asc _    o ;;
    18: SQL NULL;
    19: len 4; hex 80000010; asc     ;;
    20: len 1; hex 80; asc  ;;
   
   Record lock, heap no 97 PHYSICAL RECORD: n_fields 21; compact format; info bits 0
    0: len 18; hex 6572676f5f6a6f625f636f6c6c6563746f72; asc ergo_job_collector;;
    1: len 6; hex 000100d29cc5; asc       ;;
    2: len 7; hex 420006001209ca; asc B      ;;
    3: len 1; hex 80; asc  ;;
    4: len 1; hex 80; asc  ;;
    5: len 1; hex 81; asc  ;;
    6: SQL NULL;
    7: SQL NULL;
    8: SQL NULL;
    9: SQL NULL;
    10: SQL NULL;
    11: len 29; hex 2f6f70742f616972666c6f772f646167732f6461675f6572676f2e7079; asc /opt/airflow/dags/dag_ergo.py;;
    12: len 7; hex 616972666c6f77; asc airflow;;
    13: SQL NULL;
    14: len 4; hex 74726565; asc tree;;
    15: len 30; hex 7b2274797065223a202274696d6564656c7461222c20226174747273223a; asc {"type": "timedelta", "attrs":; (total 77 bytes);
    16: SQL NULL;
    17: len 7; hex 5f9893e102c57c; asc _     |;;
    18: len 7; hex 5f9893eb02c57c; asc _     |;;
    19: len 4; hex 80000010; asc     ;;
    20: len 1; hex 80; asc  ;;
   
   *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
   RECORD LOCKS space id 5563 page no 5 n bits 168 index idx_next_dagrun_create_after of table `chronos_airflow`.`dag` trx id 4308769989 lock_mode X locks rec but not gap waiting
   Record lock, heap no 101 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
    0: len 7; hex 5f9893ea058c2f; asc _     /;;
    1: len 18; hex 6572676f5f6a6f625f636f6c6c6563746f72; asc ergo_job_collector;;
   
   *** WE ROLL BACK TRANSACTION (1)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #11899: [MariaDB] Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #11899:
URL: https://github.com/apache/airflow/issues/11899


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718640942


   > Can I just re-iterate again how wrong MariaDB is to call this a DEAD lock. A dead lock is when process A holds locks 1, then tries to get lock 2, and process B holds lock 2 then tries to get lock 1. In the DB status logs there is only one lock involved -- so either the logs aren't useful, or this is just a case of "something else has my lock I should wait/retry".
   
   Yeah it does seems fishy.
   From what I'm able to understand from InnoDB output, SELECT Transaction (1) doesn't even hold any lock but waiting for locks held by UPDATE Transaction (2). And Transaction (2) is waiting for some other lock. :shrug: 
   
   > Could you try with AIRFLOW__CORE__STORE_DAG_CODE=False @shivanshs9 ? If that helps I have an idea.
   
   @ashb Nope, still facing the same issue:
   ```
   $ airflow config get-value core store_dag_code
   False
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-735753823


   I'm going to close this now -- if this is still a problem someone can reply and we'll re-open it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718625532


   Could you try with `AIRFLOW__CORE__STORE_DAG_CODE=False` @shivanshs9 ? If that helps I have an idea.
   
   If not I think our last resort is to simply catch this error and retry later (either next loop, or after a timeout) -- I don't see what else we can change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718675674


   Also:
   * `docker pull apache/airflow:2.0.0a2-python3.8`
   * `docker pull apache/airflow:2.0.0a2-python3.7`
   
   Issue to automate it: https://github.com/apache/airflow/issues/11937 scheduled for beta1
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718642137


   > I will push the tagged images shortly.
   
   Yep, that will be very helpful for testing! :raised_hands: 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718580026


   @shivanshs9 Which version of MariaDB are you on? And is this a single node, or a Galera cluster?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb closed issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb closed issue #11899:
URL: https://github.com/apache/airflow/issues/11899


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718583745


   @shivanshs9 Something doesn't add up with your stack trace -- you say you are on 2.0.0a2, but Line 1451 from this stack trace:
   
   ```  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1308, in _execute
       self._run_scheduler_loop()
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1379, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1451, in _do_scheduling
       self._create_dag_runs(query.all(), session)
   ```
   
   is a blank line.
   
   https://github.com/apache/airflow/blob/2.0.0a2/airflow/jobs/scheduler_job.py#L1451


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718641111


   I will push the tagged images shortly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-718698218


   > apache/airflow:2.0.0a2-python3.8
   
   Thanks @potiuk! Using this tag now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-735866934


   > Do you have any sidecars running in the pod?
   
   Nope. but yeah it's an unrelated issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shivanshs9 commented on issue #11899: Scheduler deadlock with max_threads > 1

Posted by GitBox <gi...@apache.org>.
shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-735769434


   @ashb Ah sorry for the delay in response. The issue is still occurring, unfortunately.
   <details>
   <summary>Scheduler logs</summary>
   
   ```
   [2020-11-30 12:11:01,752] {{scheduler_job.py:1301}} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
       self.dialect.do_execute(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
       cursor.execute(statement, parameters)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 255, in execute
       self.errorhandler(self, exc, value)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
       raise errorvalue
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 252, in execute
       res = self._query(query)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 378, in _query
       db.query(q)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 280, in query
       _mysql.connection.query(self, query)
   _mysql_exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1283, in _execute
       self._run_scheduler_loop()
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1385, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1543, in _do_scheduling
       num_queued_tis = self._critical_section_execute_task_instances(session=session)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 1140, in _critical_section_execute_task_instances
       queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py", line 59, in wrapper
       return func(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py", line 932, in _executable_task_instances_to_queued
       task_instances_to_examine: List[TI] = with_row_locks(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3341, in all
       return list(self)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3503, in __iter__
       return self._execute_and_instances(context)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3528, in _execute_and_instances
       result = conn.execute(querycontext.statement, self._params)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1014, in execute
       return meth(self, multiparams, params)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1127, in _execute_clauseelement
       ret = self._execute_context(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
       self._handle_dbapi_exception(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
       util.raise_(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
       raise exception
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
       self.dialect.do_execute(
     File "/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
       cursor.execute(statement, parameters)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 255, in execute
       self.errorhandler(self, exc, value)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
       raise errorvalue
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 252, in execute
       res = self._query(query)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 378, in _query
       db.query(q)
     File "/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 280, in query
       _mysql.connection.query(self, query)
   sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
   [SQL: SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.execution_date AS task_instance_execution_date, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.pool_slots AS task_instance_pool_slots, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.queued_by_job_id AS task_instance_queued_by_job_id, 
 task_instance.pid AS task_instance_pid, task_instance.executor_config AS task_instance_executor_config, task_instance.external_executor_id AS task_instance_external_executor_id
   FROM task_instance LEFT OUTER JOIN dag_run ON task_instance.dag_id = dag_run.dag_id AND task_instance.execution_date = dag_run.execution_date INNER JOIN dag ON task_instance.dag_id = dag.dag_id
   WHERE (dag_run.run_id IS NULL OR dag_run.run_type != %s) AND dag.is_paused = 0 AND task_instance.state = %s
    LIMIT %s FOR UPDATE]
   [parameters: (<DagRunType.BACKFILL_JOB: 'backfill'>, 'scheduled', 29)]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2020-11-30 12:11:02,774] {{process_utils.py:95}} INFO - Sending Signals.SIGTERM to GPID 50
   [2020-11-30 12:11:12,964] {{process_utils.py:198}} INFO - Terminating child PID: 335
   [2020-11-30 12:11:12,964] {{process_utils.py:198}} INFO - Terminating child PID: 336
   [2020-11-30 12:11:12,964] {{process_utils.py:201}} INFO - Waiting up to 5 seconds for processes to exit...
   [2020-11-30 12:11:17,974] {{process_utils.py:214}} INFO - SIGKILL processes that did not terminate gracefully
   [2020-11-30 12:11:17,975] {{process_utils.py:216}} INFO - Killing child PID: 335
   [2020-11-30 12:11:17,979] {{process_utils.py:216}} INFO - Killing child PID: 336
   [2020-11-30 12:11:18,015] {{process_utils.py:61}} INFO - Process psutil.Process(pid=335, status='terminated', started='12:10:59') (335) terminated with exit code None
   [2020-11-30 12:11:18,440] {{process_utils.py:61}} INFO - Process psutil.Process(pid=336, status='terminated', started='12:11:00') (336) terminated with exit code None
   [2020-11-30 12:12:02,785] {{process_utils.py:108}} WARNING - process psutil.Process(pid=334, name='airflow schedul', status='sleeping', started='12:10:59') did not respond to SIGTERM. Trying SIGKILL
   [2020-11-30 12:12:02,786] {{process_utils.py:108}} WARNING - process psutil.Process(pid=50, name='airflow scheduler -- DagFileProcessorManager', status='sleeping', started='12:09:58') did not respond to SIGTERM. Trying SIGKILL
   [2020-11-30 12:12:02,787] {{process_utils.py:108}} WARNING - process psutil.Process(pid=331, name='airflow schedul', status='sleeping', started='12:10:58') did not respond to SIGTERM. Trying SIGKILL
   [2020-11-30 12:12:02,801] {{process_utils.py:61}} INFO - Process psutil.Process(pid=334, name='airflow schedul', status='terminated', started='12:10:59') (334) terminated with exit code None
   [2020-11-30 12:12:02,801] {{process_utils.py:61}} INFO - Process psutil.Process(pid=50, name='airflow scheduler -- DagFileProcessorManager', status='terminated', exitcode=<Negsignal.SIGKILL: -9>, started='12:09:58') (50) terminated with exit code Negsignal.SIGKILL
   [2020-11-30 12:12:02,802] {{process_utils.py:61}} INFO - Process psutil.Process(pid=331, name='airflow schedul', status='terminated', started='12:10:58') (331) terminated with exit code None
   [2020-11-30 12:12:02,802] {{scheduler_job.py:1304}} INFO - Exited execute loop
   ```
   </details>
   
   Airflow version:
   ```
   airflow@ergo-chronos-scheduler-695d46c8d6-qgnvv:/opt/airflow$ airflow version
   [2020-11-30 12:52:49,519] {{plugins_manager.py:283}} INFO - Loading 2 plugin(s) took 0.86 seconds
   2.0.0b3
   ```
   
   Weirdly, I think the process is being terminated (as in the logs) but it's not exactly crashing the enclosing pod. So the container is not being restarted either causing the scheduler to not work indefinitely.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org