You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/23 18:19:57 UTC

[GitHub] [airflow] stablum opened a new issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

stablum opened a new issue #19957:
URL: https://github.com/apache/airflow/issues/19957


   ### Apache Airflow version
   
   2.2.2 (latest released)
   
   ### Operating System
   
   Ubuntu 21.04 on a VM
   
   ### Versions of Apache Airflow Providers
   
   root@AI-Research:~/learning_sets/airflow# pip freeze | grep apache-airflow-providers
   apache-airflow-providers-ftp==2.0.1
   apache-airflow-providers-http==2.0.1
   apache-airflow-providers-imap==2.0.1
   apache-airflow-providers-sqlite==2.0.1
   
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   Airflow is at version 2.2.2
   psql (PostgreSQL) 13.5 (Ubuntu 13.5-0ubuntu0.21.04.1)
   
   The dag contains thousands of tasks for data download and preprocessing and preparation which is destined to a mongodb database (so, I'm not using the PostgreSQL inside my tasks).
   
   ### What happened
   
   [2021-12-01 19:41:57,556] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 322086 waits for ShareLock on transaction 2391367; blocked by process 340345.
   Process 340345 waits for AccessExclusiveLock on tuple (0,26) of relation 19255 of database 19096; blocked by process 340300.
   Process 340300 waits for ShareLock on transaction 2391361; blocked by process 322086.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1335,10) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 322086 waits for ShareLock on transaction 2391367; blocked by process 340345.
   Process 340345 waits for AccessExclusiveLock on tuple (0,26) of relation 19255 of database 19096; blocked by process 340300.
   Process 340300 waits for ShareLock on transaction 2391361; blocked by process 322086.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1335,10) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s, %(task_id_8)s, %(task_id_9)s, %(task_id_10)s, %(task_id_11)s, %(task_id_12)s, %(task_id_13)s, %(task_id_14)s, %(task_id_15)s, %(task_id_16)s, %(task_id_17)s, %(task_id_18)s, %(task_id_19)s, %(task_id_20)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-01T17:31:23.684597+00:00', 'task_id_1': 'download_1379', 'task_id_2': 'download_1438', 'task_id_3': 'download_1363', 'task_id_4': 'download_1368', 'task_id_5': 'download_138', 'task_id_6': 'download_1432', 'task_id_7': 'download_1435', 'task_id_8': 'download_1437', 'task_id_9': 'download_1439', 'task_id_10': 'download_1457', 'task_id_11': 'download_168', 'task_id_12': 'download_203', 'task_id_13': 'download_782', 'task_id_14': 'download_1430', 'task_id_15': 'download_1431', 'task_id_16': 'download_1436', 'task_id_17': 'download_167', 'task_id_18': 'download_174', 'task_id_19': 'download_205', 'task_id_20': 'download_1434'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-01 19:41:57,566] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2021-12-01 19:42:18,013] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 285470
   [2021-12-01 19:42:18,105] {process_utils.py:66} INFO - Process psutil.Process(pid=285470, status='terminated', exitcode=0, started='18:56:21') (285470) terminated with exit code 0
   [2021-12-01 19:42:18,106] {scheduler_job.py:655} INFO - Exited execute loop
   
   
   ### What you expected to happen
   
   Maybe 24 concurrent processes/tasks are too many?
   
   ### How to reproduce
   
   reproducibility is challenging, but maybe the exception provides enough info for a fix
   
   ### Anything else
   
   all the time, after some time the dag is being run
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum edited a comment on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994343254


   unfortunately, this keeps happening (after couple of weeks where it was running smoothly)
   
   ```
   [2021-12-15 01:54:30,915] {dagbag.py:500} INFO - Filling up the DagBag from /root/learning_sets/models/
   dag_bag <airflow.models.dagbag.DagBag object at 0x7f56aa88cf70>
   Running <TaskInstance: download_and_preprocess_sets.download_1466 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
   Running <TaskInstance: download_and_preprocess_sets.download_952 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
   [2021-12-15 01:54:43,539] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-15 01:54:43,544] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2021-12-15 01:54:53,885] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1117612
   [2021-12-15 01:54:54,098] {process_utils.py:66} INFO - Process psutil.Process(pid=1117612, status='terminated', exitcode=0, started='01:38:54') (1117612) terminated with exit code 0
   [2021-12-15 01:54:54,098] {scheduler_job.py:655} INFO - Exited execute loop
   [2021-12-15 01:54:54 +0100] [1117543] [INFO] Handling signal: term
   [2021-12-15 01:54:54 +0100] [1117568] [INFO] Worker exiting (pid: 1117568)
   [2021-12-15 01:54:54 +0100] [1117549] [INFO] Worker exiting (pid: 1117549)
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
       args.func(args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
       _run_scheduler_job(args=args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
       job.run()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 245, in run
       self._execute()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-15 01:54:54 +0100] [1117543] [INFO] Shutting down: Master
   ```
   
   I just restarted it and this are the scheduler processes running: I launched it with `screen`:
   ```
   root@AI-Research:~# ps aux | grep airflow | grep sched | grep -v grep
   root     3166838  0.0  0.0   9024  2460 ?        Ss   07:17   0:00 SCREEN -L -Logfile logs/airflow_scheduler_20211215_071716.log -S airflow_scheduler -d -m airflow scheduler
   root     3166845 56.1  1.0 212516 179704 pts/66  Rs+  07:17   0:16 /usr/bin/python3 /usr/local/bin/airflow scheduler
   root     3167171  0.5  0.4 111728 74756 pts/66   S    07:17   0:00 airflow scheduler -- DagFileProcessorManager
   ```
   
   And this is the launching script:
   ```
   root@AI-Research:~/learning_sets/airflow# cat launch_airflow.sh 
   #!/bin/bash
   
   SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
   echo SCRIPT_DIR: $SCRIPT_DIR
   LEARNING_SETS_DIR="$(readlink -f $SCRIPT_DIR/..)"
   echo LEARNING_SETS_DIR: $LEARNING_SETS_DIR
   
   cd $LEARNING_SETS_DIR
   
   TS=$(date +%Y%m%d_%H%M%S)
   
   screen -X -S airflow_scheduler quit
   screen -X -S airflow_webserver quit
   
   sleep 1
   
   ps aux | grep airflow | grep -v launch | awk '{print $2}' | xargs kill
   
   sleep 1
   
   screen -L -Logfile logs/airflow_scheduler_${TS}.log -S airflow_scheduler -d -m airflow scheduler
   screen -L -Logfile logs/airflow_webserver_${TS}.log -S airflow_webserver -d -m airflow webserver
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1079965417


   Unfortunately I'm still experiencing this bug with Airflow 2.2.4:
   
   ```
   [2022-03-27 16:14:12,804] {scheduler_job.py:433} INFO - Setting the following tasks to queued state:
   	
   [2022-03-27 16:14:12,805] {scheduler_job.py:527} INFO - Executor reports execution of download_and_preprocess_sets.download_1544 run_id=manual__2022-03-24T13:43:29.617461+00:00 exited with status success for try_number 2
   [2022-03-27 16:14:12,809] {scheduler_job.py:570} INFO - TaskInstance Finished: dag_id=download_and_preprocess_sets, task_id=download_1544, run_id=manual__2022-03-24T13:43:29.617461+00:00, run_start_date=2022-03-27 16:13:00.591762+00:00, run_end_date=2022-03-27 16:13:37.437128+00:00, run_duration=36.845366, state=success, executor_state=success, try_number=2, max_tries=2, job_id=10383, pool=default_pool, queue=default, priority_weight=59, operator=PythonOperator
   [2022-03-27 16:14:12,829] {scheduler_job.py:1137} INFO - Resetting orphaned tasks for active dag runs
   [2022-03-27 16:14:12,830] {scheduler_job.py:1160} INFO - Marked 1 SchedulerJob instances as failed
   [2022-03-27 16:14:12,835] {scheduler_job.py:1201} INFO - Reset the following 1 orphaned TaskInstances:
   	<TaskInstance: download_and_preprocess_sets.download_1546 manual__2022-03-24T13:43:29.617461+00:00 [running]>
   [2022-03-27 16:14:45,853] {scheduler_job.py:667} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2022-03-27 16:14:45,858] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2022-03-27 16:14:46,970] {process_utils.py:120} INFO - Sending Signals.SIGTERM to group 309683. PIDs of all processes in the group: [309683]
   [2022-03-27 16:14:46,970] {process_utils.py:75} INFO - Sending the signal Signals.SIGTERM to group 309683
   [2022-03-27 16:14:47,022] {process_utils.py:70} INFO - Process psutil.Process(pid=309683, status='terminated', exitcode=0, started='16:08:59') (309683) terminated with exit code 0
   [2022-03-27 16:14:47,022] {scheduler_job.py:678} INFO - Exited execute loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
       args.func(args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
       _run_scheduler_job(args=args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
       job.run()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 246, in run
       self._execute()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   ```
   I also set `concurrency=1` and `max_active_tasks=1`, but the bug keep reappearing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dwiajik commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
dwiajik commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011683020


   I am experiencing the same problem and have set `schedule_after_task_execution` to `False`. The issue still persist. Do you have any suggestion? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994816151


   Thanks @stablum ! Great to see all the detailed information (and the pastebin)
   
   @ashb - I think that one might need some closer look from both of us :) and some hypotheses on where it could come from.
   I plan to take a look at that later this week and maybe we can figure out where it comes from.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum edited a comment on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994343254


   unfortunately, this keeps happening (after couple of weeks where it was running smoothly)
   
   ```
   [2021-12-15 01:54:30,915] {dagbag.py:500} INFO - Filling up the DagBag from /root/learning_sets/models/
   dag_bag <airflow.models.dagbag.DagBag object at 0x7f56aa88cf70>
   Running <TaskInstance: download_and_preprocess_sets.download_1466 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
   Running <TaskInstance: download_and_preprocess_sets.download_952 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
   [2021-12-15 01:54:43,539] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-15 01:54:43,544] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2021-12-15 01:54:53,885] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1117612
   [2021-12-15 01:54:54,098] {process_utils.py:66} INFO - Process psutil.Process(pid=1117612, status='terminated', exitcode=0, started='01:38:54') (1117612) terminated with exit code 0
   [2021-12-15 01:54:54,098] {scheduler_job.py:655} INFO - Exited execute loop
   [2021-12-15 01:54:54 +0100] [1117543] [INFO] Handling signal: term
   [2021-12-15 01:54:54 +0100] [1117568] [INFO] Worker exiting (pid: 1117568)
   [2021-12-15 01:54:54 +0100] [1117549] [INFO] Worker exiting (pid: 1117549)
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
       args.func(args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
       _run_scheduler_job(args=args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
       job.run()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 245, in run
       self._execute()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-15 01:54:54 +0100] [1117543] [INFO] Shutting down: Master
   ```
   
   I just restarted it and this are the scheduler processes running: I launched it with `screen`:
   ```
   root@AI-Research:~# ps aux | grep airflow | grep sched | grep -v grep
   root     3166838  0.0  0.0   9024  2460 ?        Ss   07:17   0:00 SCREEN -L -Logfile logs/airflow_scheduler_20211215_071716.log -S airflow_scheduler -d -m airflow scheduler
   root     3166845 56.1  1.0 212516 179704 pts/66  Rs+  07:17   0:16 /usr/bin/python3 /usr/local/bin/airflow scheduler
   root     3167171  0.5  0.4 111728 74756 pts/66   S    07:17   0:00 airflow scheduler -- DagFileProcessorManager
   ```
   
   And this is the launching script:
   ```
   root@AI-Research:~/learning_sets/airflow# cat launch_airflow.sh 
   #!/bin/bash
   
   TS=$(date +%Y%m%d_%H%M%S)
   
   screen -X -S airflow_scheduler quit
   screen -X -S airflow_webserver quit
   
   sleep 1
   
   ps aux | grep airflow | grep -v launch | awk '{print $2}' | xargs kill
   
   sleep 1
   
   screen -L -Logfile logs/airflow_scheduler_${TS}.log -S airflow_scheduler -d -m airflow scheduler
   screen -L -Logfile logs/airflow_webserver_${TS}.log -S airflow_webserver -d -m airflow webserver
   ```
   
   Responding to your questions:
   * no other process or script is using the PostgreSQL db at the moment
   * The last two (very long) queries can be read here: [https://nopaste.net/nMCqRrrUR1](https://nopaste.net/nMCqRrrUR1)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AmarEL edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
AmarEL edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-999902054


   We are facing the very same problem with Postgres.
   
   Even being a different database, the stack trace shows the same line/method being called on the airflow "layer" before moving to the database concrete class for the #19832
   ```
   File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   ```
   
   I didn't have time to test, but there is a chance the #20030 "fixes" this one together with the #19832


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum edited a comment on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994343254


   unfortunately, this keeps happening (after couple of weeks where it was running smoothly)
   
   ```
   [2021-12-15 01:54:30,915] {dagbag.py:500} INFO - Filling up the DagBag from /root/learning_sets/models/
   dag_bag <airflow.models.dagbag.DagBag object at 0x7f56aa88cf70>
   Running <TaskInstance: download_and_preprocess_sets.download_1466 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
   Running <TaskInstance: download_and_preprocess_sets.download_952 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
   [2021-12-15 01:54:43,539] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-15 01:54:43,544] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2021-12-15 01:54:53,885] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1117612
   [2021-12-15 01:54:54,098] {process_utils.py:66} INFO - Process psutil.Process(pid=1117612, status='terminated', exitcode=0, started='01:38:54') (1117612) terminated with exit code 0
   [2021-12-15 01:54:54,098] {scheduler_job.py:655} INFO - Exited execute loop
   [2021-12-15 01:54:54 +0100] [1117543] [INFO] Handling signal: term
   [2021-12-15 01:54:54 +0100] [1117568] [INFO] Worker exiting (pid: 1117568)
   [2021-12-15 01:54:54 +0100] [1117549] [INFO] Worker exiting (pid: 1117549)
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
       args.func(args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
       _run_scheduler_job(args=args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
       job.run()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 245, in run
       self._execute()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-15 01:54:54 +0100] [1117543] [INFO] Shutting down: Master
   ```
   
   I just restarted it and this are the scheduler processes running: I launched it with `screen`:
   ```
   root@AI-Research:~# ps aux | grep airflow | grep sched | grep -v grep
   root     3166838  0.0  0.0   9024  2460 ?        Ss   07:17   0:00 SCREEN -L -Logfile logs/airflow_scheduler_20211215_071716.log -S airflow_scheduler -d -m airflow scheduler
   root     3166845 56.1  1.0 212516 179704 pts/66  Rs+  07:17   0:16 /usr/bin/python3 /usr/local/bin/airflow scheduler
   root     3167171  0.5  0.4 111728 74756 pts/66   S    07:17   0:00 airflow scheduler -- DagFileProcessorManager
   ```
   
   And this is the launching script:
   ```
   root@AI-Research:~/learning_sets/airflow# cat launch_airflow.sh 
   #!/bin/bash
   
   TS=$(date +%Y%m%d_%H%M%S)
   
   screen -X -S airflow_scheduler quit
   screen -X -S airflow_webserver quit
   
   sleep 1
   
   ps aux | grep airflow | grep -v launch | awk '{print $2}' | xargs kill
   
   sleep 1
   
   screen -L -Logfile logs/airflow_scheduler_${TS}.log -S airflow_scheduler -d -m airflow scheduler
   screen -L -Logfile logs/airflow_webserver_${TS}.log -S airflow_webserver -d -m airflow webserver
   ```
   
   Responding to your questions:
   * no other process or script is using the PostgreSQL db at the moment
   * The last two queries will be added in my next comment
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1019554113


   > @dwiajik @stablum - is there any chance you have some customisations (plugins ? ) or users running DB operations (backfill? API calls? UI modifications) that might have caused the deadlock? Looking at the code, my intuition tells me that this must have been something external. Having the server logs could help to pin-point it.
   
   I once experienced this while running a dag with many tasks. Parallelism was 64 and I have 4 dags. No concurrency limit on the dags, the tasks finishes in 20s and at one point the scheduler crashed. I will try and see if I can get it again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy closed issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
ephraimbuddy closed issue #19957:
URL: https://github.com/apache/airflow/issues/19957


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011857056


   I actually spent some time few days ago looking at the mini-scheduler code but I could not really find a flaw there. The fact that it did not help you indicates that my hypothesis was unfounded, unfortunately. and maybe the reason was different (and the fact that it worked for @stablum was mainly a coincidence or some side effect of that change).
   
   @dwiajik  - it might also be that your case is a bit different - could you please report (maybe create a gist with a few examples of) some of the logs of your deadlocks - Ideally if you could send us the logs of failing scheduler and corresponding logs of the Postgres server from the same time - I believe it will be much easier to investigate if we see few examples - and the server logs shoud tell us exactly which two queries deadlocked and this should help us a lot.
   
   What we really need is somethiing in hte  /var/lib/pgsql/data/pg_log/*.log, there should be entries at the time when then deadlock happens that looks like this:
   
   ```
   ERROR:  deadlock detected
   DETAIL:  Process 21535 waits for AccessExclusiveLock on relation 342640 of database 41454; blocked by process 21506.
   Process 21506 waits for AccessExclusiveLock on relation 342637 of database 41454; blocked by process 21535.
   HINT:  See server log for query details.
   CONTEXT:  SQL statement "UPDATE ..."
   ````
   
   We need ideally those and some logs around it if possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1079965417


   Unfortunately I'm still experiencing this bug with Airflow 2.2.4:
   
   ```
   [2022-03-27 16:14:12,804] {scheduler_job.py:433} INFO - Setting the following tasks to queued state:
   	
   [2022-03-27 16:14:12,805] {scheduler_job.py:527} INFO - Executor reports execution of download_and_preprocess_sets.download_1544 run_id=manual__2022-03-24T13:43:29.617461+00:00 exited with status success for try_number 2
   [2022-03-27 16:14:12,809] {scheduler_job.py:570} INFO - TaskInstance Finished: dag_id=download_and_preprocess_sets, task_id=download_1544, run_id=manual__2022-03-24T13:43:29.617461+00:00, run_start_date=2022-03-27 16:13:00.591762+00:00, run_end_date=2022-03-27 16:13:37.437128+00:00, run_duration=36.845366, state=success, executor_state=success, try_number=2, max_tries=2, job_id=10383, pool=default_pool, queue=default, priority_weight=59, operator=PythonOperator
   [2022-03-27 16:14:12,829] {scheduler_job.py:1137} INFO - Resetting orphaned tasks for active dag runs
   [2022-03-27 16:14:12,830] {scheduler_job.py:1160} INFO - Marked 1 SchedulerJob instances as failed
   [2022-03-27 16:14:12,835] {scheduler_job.py:1201} INFO - Reset the following 1 orphaned TaskInstances:
   	<TaskInstance: download_and_preprocess_sets.download_1546 manual__2022-03-24T13:43:29.617461+00:00 [running]>
   [2022-03-27 16:14:45,853] {scheduler_job.py:667} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2022-03-27 16:14:45,858] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2022-03-27 16:14:46,970] {process_utils.py:120} INFO - Sending Signals.SIGTERM to group 309683. PIDs of all processes in the group: [309683]
   [2022-03-27 16:14:46,970] {process_utils.py:75} INFO - Sending the signal Signals.SIGTERM to group 309683
   [2022-03-27 16:14:47,022] {process_utils.py:70} INFO - Process psutil.Process(pid=309683, status='terminated', exitcode=0, started='16:08:59') (309683) terminated with exit code 0
   [2022-03-27 16:14:47,022] {scheduler_job.py:678} INFO - Exited execute loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
       args.func(args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
       _run_scheduler_job(args=args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
       job.run()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 246, in run
       self._execute()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1079965417


   Unfortunately I'm still experiencing this bug with Airflow 2.2.4:
   
   ```
   [2022-03-27 16:14:12,804] {scheduler_job.py:433} INFO - Setting the following tasks to queued state:
   	
   [2022-03-27 16:14:12,805] {scheduler_job.py:527} INFO - Executor reports execution of download_and_preprocess_sets.download_1544 run_id=manual__2022-03-24T13:43:29.617461+00:00 exited with status success for try_number 2
   [2022-03-27 16:14:12,809] {scheduler_job.py:570} INFO - TaskInstance Finished: dag_id=download_and_preprocess_sets, task_id=download_1544, run_id=manual__2022-03-24T13:43:29.617461+00:00, run_start_date=2022-03-27 16:13:00.591762+00:00, run_end_date=2022-03-27 16:13:37.437128+00:00, run_duration=36.845366, state=success, executor_state=success, try_number=2, max_tries=2, job_id=10383, pool=default_pool, queue=default, priority_weight=59, operator=PythonOperator
   [2022-03-27 16:14:12,829] {scheduler_job.py:1137} INFO - Resetting orphaned tasks for active dag runs
   [2022-03-27 16:14:12,830] {scheduler_job.py:1160} INFO - Marked 1 SchedulerJob instances as failed
   [2022-03-27 16:14:12,835] {scheduler_job.py:1201} INFO - Reset the following 1 orphaned TaskInstances:
   	<TaskInstance: download_and_preprocess_sets.download_1546 manual__2022-03-24T13:43:29.617461+00:00 [running]>
   [2022-03-27 16:14:45,853] {scheduler_job.py:667} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2022-03-27 16:14:45,858] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2022-03-27 16:14:46,970] {process_utils.py:120} INFO - Sending Signals.SIGTERM to group 309683. PIDs of all processes in the group: [309683]
   [2022-03-27 16:14:46,970] {process_utils.py:75} INFO - Sending the signal Signals.SIGTERM to group 309683
   [2022-03-27 16:14:47,022] {process_utils.py:70} INFO - Process psutil.Process(pid=309683, status='terminated', exitcode=0, started='16:08:59') (309683) terminated with exit code 0
   [2022-03-27 16:14:47,022] {scheduler_job.py:678} INFO - Exited execute loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
       args.func(args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
       _run_scheduler_job(args=args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
       job.run()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 246, in run
       self._execute()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   ```
   I also set `concurrency=1` and `max_active_tasks=1`, but the bug keeps reappearing
   
   This is my PostgreSQL log:
   ```
   2022-03-27 15:37:25.192 UTC [273303] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 15:37:25.201 UTC [300223] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 15:37:25.203 UTC [298660] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 15:37:25.205 UTC [297039] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 15:37:25.209 UTC [301829] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 15:37:26.193 UTC [276786] airflow_user@airflow_db FATAL:  terminating connection due to administrator command
   2022-03-27 15:39:22.382 UTC [304375] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 15:40:31.194 UTC [305340] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 15:41:38.554 UTC [306673] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 15:42:51.378 UTC [307629] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 15:43:23.546 UTC [303336] airflow_user@airflow_db ERROR:  deadlock detected
   2022-03-27 15:43:23.546 UTC [303336] airflow_user@airflow_db DETAIL:  Process 303336 waits for ShareLock on transaction 12660394; blocked by process 309021.
   	Process 309021 waits for ShareLock on transaction 12660383; blocked by process 303336.
   	Process 303336: UPDATE task_instance SET state='scheduled' WHERE task_instance.dag_id = 'download_and_preprocess_sets' AND task_instance.run_id = 'manual__2022-03-24T13:43:29.617461+00:00' AND task_instance.task_id IN ('parse_1381', 'download_1385')
   	Process 309021: SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.run_id AS task_instance_run_id, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.pool_slots AS task_instance_pool_slots, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.queued_by_job_id AS task_instance_queued_by_job_id, task_
 instance.pid AS task_instance_pid, task_insta
   2022-03-27 15:43:23.546 UTC [303336] airflow_user@airflow_db HINT:  See server log for query details.
   2022-03-27 15:43:23.546 UTC [303336] airflow_user@airflow_db CONTEXT:  while updating tuple (4903,44) in relation "task_instance"
   2022-03-27 15:43:23.546 UTC [303336] airflow_user@airflow_db STATEMENT:  UPDATE task_instance SET state='scheduled' WHERE task_instance.dag_id = 'download_and_preprocess_sets' AND task_instance.run_id = 'manual__2022-03-24T13:43:29.617461+00:00' AND task_instance.task_id IN ('parse_1381', 'download_1385')
   2022-03-27 15:43:32.520 UTC [308612] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 15:43:33.596 UTC [303337] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 16:10:41.435 UTC [310498] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 16:11:54.228 UTC [311405] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 16:12:26.266 UTC [312643] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 16:13:37.542 UTC [312957] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 16:14:45.851 UTC [309686] airflow_user@airflow_db ERROR:  deadlock detected
   2022-03-27 16:14:45.851 UTC [309686] airflow_user@airflow_db DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   	Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   	Process 309686: UPDATE task_instance SET state='scheduled' WHERE task_instance.dag_id = 'download_and_preprocess_sets' AND task_instance.run_id = 'manual__2022-03-24T13:43:29.617461+00:00' AND task_instance.task_id IN ('download_1546')
   	Process 314699: SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.run_id AS task_instance_run_id, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.pool_slots AS task_instance_pool_slots, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.queued_by_job_id AS task_instance_queued_by_job_id, task_
 instance.pid AS task_instance_pid, task_insta
   2022-03-27 16:14:45.851 UTC [309686] airflow_user@airflow_db HINT:  See server log for query details.
   2022-03-27 16:14:45.851 UTC [309686] airflow_user@airflow_db CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   2022-03-27 16:14:45.851 UTC [309686] airflow_user@airflow_db STATEMENT:  UPDATE task_instance SET state='scheduled' WHERE task_instance.dag_id = 'download_and_preprocess_sets' AND task_instance.run_id = 'manual__2022-03-24T13:43:29.617461+00:00' AND task_instance.task_id IN ('download_1546')
   2022-03-27 16:14:45.958 UTC [314289] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   2022-03-27 16:14:47.017 UTC [309687] airflow_user@airflow_db LOG:  could not receive data from client: Connection reset by peer
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #19957:
URL: https://github.com/apache/airflow/issues/19957


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-986332738


   Do you happen to have one or multiple schedulers? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-995617655


   it seems that by dropping airflow's database entirely and recreating it from scratch, the bug is not re-occurring. So it might have been something in the airflow's db data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1003544650


   cc: @ashb  -> do you think this is plausible hypothesis ^^ ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1007969558


   > 
   
   I managed to run a large number tasks without airflow crashing, so changing that setting as you suggested did indeed help! Thank you! :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1019539469


   I am afraid we need to reopen this one. IMHO #20894 has no chance of fixing the problem because it does not change Airflow behaviour really (see discussion in #20894). @dwiajik @stablum if you experience this problem still - I think we really need some server-side logs that will telll us what other query is deadlocking with this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1019550207


   @dwiajik @stablum - is there any chance you have some customisations (plugins ? ) or users running DB operations (backfill? API calls? UI modifications) that might have caused the deadlock? Looking at the code, my intuition tells me that this must have been something external. Having the server logs could help to pin-point it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1079965417


   Unfortunately I'm still experiencing this bug with Airflow 2.2.4:
   
   ```
   [2022-03-27 16:14:12,804] {scheduler_job.py:433} INFO - Setting the following tasks to queued state:
   	
   [2022-03-27 16:14:12,805] {scheduler_job.py:527} INFO - Executor reports execution of download_and_preprocess_sets.download_1544 run_id=manual__2022-03-24T13:43:29.617461+00:00 exited with status success for try_number 2
   [2022-03-27 16:14:12,809] {scheduler_job.py:570} INFO - TaskInstance Finished: dag_id=download_and_preprocess_sets, task_id=download_1544, run_id=manual__2022-03-24T13:43:29.617461+00:00, run_start_date=2022-03-27 16:13:00.591762+00:00, run_end_date=2022-03-27 16:13:37.437128+00:00, run_duration=36.845366, state=success, executor_state=success, try_number=2, max_tries=2, job_id=10383, pool=default_pool, queue=default, priority_weight=59, operator=PythonOperator
   [2022-03-27 16:14:12,829] {scheduler_job.py:1137} INFO - Resetting orphaned tasks for active dag runs
   [2022-03-27 16:14:12,830] {scheduler_job.py:1160} INFO - Marked 1 SchedulerJob instances as failed
   [2022-03-27 16:14:12,835] {scheduler_job.py:1201} INFO - Reset the following 1 orphaned TaskInstances:
   	<TaskInstance: download_and_preprocess_sets.download_1546 manual__2022-03-24T13:43:29.617461+00:00 [running]>
   [2022-03-27 16:14:45,853] {scheduler_job.py:667} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2022-03-27 16:14:45,858] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2022-03-27 16:14:46,970] {process_utils.py:120} INFO - Sending Signals.SIGTERM to group 309683. PIDs of all processes in the group: [309683]
   [2022-03-27 16:14:46,970] {process_utils.py:75} INFO - Sending the signal Signals.SIGTERM to group 309683
   [2022-03-27 16:14:47,022] {process_utils.py:70} INFO - Process psutil.Process(pid=309683, status='terminated', exitcode=0, started='16:08:59') (309683) terminated with exit code 0
   [2022-03-27 16:14:47,022] {scheduler_job.py:678} INFO - Exited execute loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
       args.func(args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
       _run_scheduler_job(args=args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
       job.run()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 246, in run
       self._execute()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
   Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (4928,21) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   ```
   I also set `concurrency=1` and `max_active_tasks=1`, but the bug keeps reappearing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1002095615


   Just upgraded to Airflow 2.2.3, and unfortunately it keeps crashing as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1008022416


   > I managed to run a large number tasks without airflow crashing, so changing that setting as you suggested did indeed help! Thank you! :)
   
   Cool! Thanks for confirming! Now we need to find the root cause and fix it !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1004742862


   Yes, that seems plausible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1005969121


   @stablum @AmarEL - is it possible that you disable this configuraiton and see if it helps?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011857056


   I actually spent some time few days ago looking at the mini-scheduler code but I could not really find a flaw there. The fact that it did not help you indicates that my hypothesis was unfounded, unfortunately. and maybe the reason was different (and the fact that it worked for @stablum was mainly a coincidence or some side effect of that change).
   
   @dwiajik  - it might also be that your case is a bit different - could you please report (maybe create a gist with a few examples of) some of the logs of your deadlocks - Ideally if you could send us the logs of failing scheduler and corresponding logs of the Postgres server from the same time - I believe it will be much easier to investigate if we see few examples - and the server logs shoud tell us exactly which two queries deadlocked and this should help us a lot.
   
   What we really need is somethiing in hte  /var/lib/pgsql/data/pg_log/*.log, there should be entries at the time when then deadlock happens that looks like this:
   
   ```
   ERROR:  deadlock detected
   DETAIL:  Process 21535 waits for AccessExclusiveLock on relation 342640 of database 41454; blocked by process 21506.
   Process 21506 waits for AccessExclusiveLock on relation 342637 of database 41454; blocked by process 21535.
   HINT:  See server log for query details.
   CONTEXT:  SQL statement "UPDATE ..."
   ````


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-986335228


   (Reason why I am asking - there are very low number of Postgres deadlock reports we receive, because (unlike MySQL) Postgres usually shows "real" deadlocks and we'd expect this kind of problem to appear more often amongst our users, so I am looking for any special circumstances you might have.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum edited a comment on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-986608097


   Hmm, it's difficult to retrieve this bit of information. I don't think there were multiple instances of the scheduler running, but I will keep in mind this eventuality, so thank you for the hint. Anyways, I'm not experiencing the problem anymore right now, maybe related to my upgrading the system packages and rebooting. If this bug happens again I will update this thread with more info about SQL query or eventual multiple scheduler instances being run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-986608097


   Hmm, it's difficult to retrieve this bit of information. I don't think there were multiple instances of the scheduler running, but I will keep in mind this eventuality, so thank you for the hint.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994816151


   Thanks @stablum ! Great to see all the detailed information (and the pastebin)
   
   @ashb - I think that one might need some closer look from both of us :) and some hypytheses on where it could come from.
   I plan to take a look at that later this week and maybe we can figure out where it comes from.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1002655342


   I looked at the code and  places where it can occure and I have a hypothesis on what could cause it. 
   And maybe some of the people here can verify my theory. 
   
   Could you please disable https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution  and see if the problem still occurs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1002655342


   I looked at the code and  places where it can occur and I have a hypothesis on what could cause it. 
   And maybe some of the people here can verify my theory. 
   
   Could you please disable https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution  and see if the problem still occurs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1002655342


   I looked at the code and  places where it can occure and I have a theory what could cause it. 
   And maybe some of people here can verify my theory. 
   
   Could you please disable https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution  and see if the problem still occur?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1002655342






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1003544607


   To bump it up - Just let us know @stablum @AmarEL if disabling mini-scheduler works.
   
   I **think** this might be causesd by the mini-scheduler in the task deadlocking with the "actual" scheduler trying to lock the same rows in a different order. But I think before we attempt to analyse it in detail and fix it - since you have easily reproducible and repeatable case, disable the mini-scheduler it and see if it helps. If It does, it will help us to narrow down the reason and possibly fix it. 
   
   The only drawback of disabling the mini-scheduler is potentially slightly longer latency in  scheduling subsequent tasks in the same dag. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AmarEL commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
AmarEL commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-999902054


   We are facing the very same problem with Postgres.
   
   Even being a different database, the stack trace shows the same line/method being called on the airflow "layer" before moving to the database concrete class.
   ```
   File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   ```
   
   I didn't have time to test, but there is a chance the #20030 "fixes" this one together with the #19832


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dwiajik edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
dwiajik edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011683020






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-986334766


   Also - is there a chance to find out from the logs what was the "other" query that the deadlock happened on? Is this also possible that you have another person or process (either manuallly or via some direct DB access) acessing the DB (for example soem scripts that perform retention or similar)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994694956


   Limiting concurrency does not solve the issue. Even by reducing the amount of concurrent tasks to 1 the exception is triggered after some time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum edited a comment on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994343254


   unfortunately, this keeps happening (after couple of weeks where it was running smoothly)
   
   ```
   [2021-12-15 01:54:30,915] {dagbag.py:500} INFO - Filling up the DagBag from /root/learning_sets/models/
   dag_bag <airflow.models.dagbag.DagBag object at 0x7f56aa88cf70>
   Running <TaskInstance: download_and_preprocess_sets.download_1466 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
   Running <TaskInstance: download_and_preprocess_sets.download_952 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
   [2021-12-15 01:54:43,539] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-15 01:54:43,544] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2021-12-15 01:54:53,885] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1117612
   [2021-12-15 01:54:54,098] {process_utils.py:66} INFO - Process psutil.Process(pid=1117612, status='terminated', exitcode=0, started='01:38:54') (1117612) terminated with exit code 0
   [2021-12-15 01:54:54,098] {scheduler_job.py:655} INFO - Exited execute loop
   [2021-12-15 01:54:54 +0100] [1117543] [INFO] Handling signal: term
   [2021-12-15 01:54:54 +0100] [1117568] [INFO] Worker exiting (pid: 1117568)
   [2021-12-15 01:54:54 +0100] [1117549] [INFO] Worker exiting (pid: 1117549)
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
       args.func(args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
       _run_scheduler_job(args=args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
       job.run()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 245, in run
       self._execute()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-15 01:54:54 +0100] [1117543] [INFO] Shutting down: Master
   ```
   
   I just restarted it and this are the scheduler processes running: I launched it with `screen`:
   ```
   root@AI-Research:~# ps aux | grep airflow | grep sched | grep -v grep
   root     3166838  0.0  0.0   9024  2460 ?        Ss   07:17   0:00 SCREEN -L -Logfile logs/airflow_scheduler_20211215_071716.log -S airflow_scheduler -d -m airflow scheduler
   root     3166845 56.1  1.0 212516 179704 pts/66  Rs+  07:17   0:16 /usr/bin/python3 /usr/local/bin/airflow scheduler
   root     3167171  0.5  0.4 111728 74756 pts/66   S    07:17   0:00 airflow scheduler -- DagFileProcessorManager
   ```
   
   And this is the launching script:
   ```
   root@AI-Research:~/learning_sets/airflow# cat launch_airflow.sh 
   #!/bin/bash
   
   TS=$(date +%Y%m%d_%H%M%S)
   
   screen -X -S airflow_scheduler quit
   screen -X -S airflow_webserver quit
   
   sleep 1
   
   ps aux | grep airflow | grep -v launch | awk '{print $2}' | xargs kill
   
   sleep 1
   
   screen -L -Logfile logs/airflow_scheduler_${TS}.log -S airflow_scheduler -d -m airflow scheduler
   screen -L -Logfile logs/airflow_webserver_${TS}.log -S airflow_webserver -d -m airflow webserver
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994343254


   unfortunately, this keeps happening (after couple of weeks where it was running smoothly)
   
   ```
   [2021-12-15 01:54:30,915] {dagbag.py:500} INFO - Filling up the DagBag from /root/learning_sets/models/
   dag_bag <airflow.models.dagbag.DagBag object at 0x7f56aa88cf70>
   Running <TaskInstance: download_and_preprocess_sets.download_1466 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
   Running <TaskInstance: download_and_preprocess_sets.download_952 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
   [2021-12-15 01:54:43,539] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-15 01:54:43,544] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2021-12-15 01:54:53,885] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1117612
   [2021-12-15 01:54:54,098] {process_utils.py:66} INFO - Process psutil.Process(pid=1117612, status='terminated', exitcode=0, started='01:38:54') (1117612) terminated with exit code 0
   [2021-12-15 01:54:54,098] {scheduler_job.py:655} INFO - Exited execute loop
   [2021-12-15 01:54:54 +0100] [1117543] [INFO] Handling signal: term
   [2021-12-15 01:54:54 +0100] [1117568] [INFO] Worker exiting (pid: 1117568)
   [2021-12-15 01:54:54 +0100] [1117549] [INFO] Worker exiting (pid: 1117549)
   Traceback (most recent call last):
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   psycopg2.errors.DeadlockDetected: deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
       args.func(args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
       _run_scheduler_job(args=args)
     File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
       job.run()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 245, in run
       self._execute()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
       self._run_scheduler_loop()
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
       num_queued_tis = self._do_scheduling(session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
       callback_to_run = self._schedule_dag_run(dag_run, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
       dag_run.schedule_tis(schedulable_tis, session)
     File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
       session.query(TI)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
       update_op.exec_()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
       self._do_exec()
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
       self._execute_stmt(update_stmt)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
       self.result = self.query._execute_crud(stmt, self.mapper)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
       return conn.execute(stmt, self._params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
       ret = self._execute_context(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
       self._handle_dbapi_exception(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
       util.raise_(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
       self.dialect.do_execute(
     File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
   sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
   Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
   Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
   HINT:  See server log for query details.
   CONTEXT:  while updating tuple (1899,3) in relation "task_instance"
   
   [SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
   [parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2021-12-15 01:54:54 +0100] [1117543] [INFO] Shutting down: Master
   ```
   
   This are the scheduler processes running: I launched it with `screen`:
   ```
   root@AI-Research:~# ps aux | grep airflow | grep sched | grep -v grep
   root     3166838  0.0  0.0   9024  2460 ?        Ss   07:17   0:00 SCREEN -L -Logfile logs/airflow_scheduler_20211215_071716.log -S airflow_scheduler -d -m airflow scheduler
   root     3166845 56.1  1.0 212516 179704 pts/66  Rs+  07:17   0:16 /usr/bin/python3 /usr/local/bin/airflow scheduler
   root     3167171  0.5  0.4 111728 74756 pts/66   S    07:17   0:00 airflow scheduler -- DagFileProcessorManager
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994817635


   Marked it provisionally to 2.2.4 in case we release it (it might go straight to 2.3.0 depending on how serious things there are / whether we haave a fix and how close we are to 2.3.0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stablum commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception

Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-995835917


   > it seems that by dropping airflow's database entirely and recreating it from scratch, the bug is not re-occurring. So it might have been something in the airflow's db data.
   
   i take this back: it's actually keeping crashing unfortunately


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org