You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/23 18:19:57 UTC
[GitHub] [airflow] stablum opened a new issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
stablum opened a new issue #19957:
URL: https://github.com/apache/airflow/issues/19957
### Apache Airflow version
2.2.2 (latest released)
### Operating System
Ubuntu 21.04 on a VM
### Versions of Apache Airflow Providers
root@AI-Research:~/learning_sets/airflow# pip freeze | grep apache-airflow-providers
apache-airflow-providers-ftp==2.0.1
apache-airflow-providers-http==2.0.1
apache-airflow-providers-imap==2.0.1
apache-airflow-providers-sqlite==2.0.1
### Deployment
Other
### Deployment details
Airflow is at version 2.2.2
psql (PostgreSQL) 13.5 (Ubuntu 13.5-0ubuntu0.21.04.1)
The dag contains thousands of tasks for data download and preprocessing and preparation which is destined to a mongodb database (so, I'm not using the PostgreSQL inside my tasks).
### What happened
[2021-12-01 19:41:57,556] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 322086 waits for ShareLock on transaction 2391367; blocked by process 340345.
Process 340345 waits for AccessExclusiveLock on tuple (0,26) of relation 19255 of database 19096; blocked by process 340300.
Process 340300 waits for ShareLock on transaction 2391361; blocked by process 322086.
HINT: See server log for query details.
CONTEXT: while updating tuple (1335,10) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 322086 waits for ShareLock on transaction 2391367; blocked by process 340345.
Process 340345 waits for AccessExclusiveLock on tuple (0,26) of relation 19255 of database 19096; blocked by process 340300.
Process 340300 waits for ShareLock on transaction 2391361; blocked by process 322086.
HINT: See server log for query details.
CONTEXT: while updating tuple (1335,10) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s, %(task_id_8)s, %(task_id_9)s, %(task_id_10)s, %(task_id_11)s, %(task_id_12)s, %(task_id_13)s, %(task_id_14)s, %(task_id_15)s, %(task_id_16)s, %(task_id_17)s, %(task_id_18)s, %(task_id_19)s, %(task_id_20)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-01T17:31:23.684597+00:00', 'task_id_1': 'download_1379', 'task_id_2': 'download_1438', 'task_id_3': 'download_1363', 'task_id_4': 'download_1368', 'task_id_5': 'download_138', 'task_id_6': 'download_1432', 'task_id_7': 'download_1435', 'task_id_8': 'download_1437', 'task_id_9': 'download_1439', 'task_id_10': 'download_1457', 'task_id_11': 'download_168', 'task_id_12': 'download_203', 'task_id_13': 'download_782', 'task_id_14': 'download_1430', 'task_id_15': 'download_1431', 'task_id_16': 'download_1436', 'task_id_17': 'download_167', 'task_id_18': 'download_174', 'task_id_19': 'download_205', 'task_id_20': 'download_1434'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-01 19:41:57,566] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
[2021-12-01 19:42:18,013] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 285470
[2021-12-01 19:42:18,105] {process_utils.py:66} INFO - Process psutil.Process(pid=285470, status='terminated', exitcode=0, started='18:56:21') (285470) terminated with exit code 0
[2021-12-01 19:42:18,106] {scheduler_job.py:655} INFO - Exited execute loop
### What you expected to happen
Maybe 24 concurrent processes/tasks are too many?
### How to reproduce
reproducibility is challenging, but maybe the exception provides enough info for a fix
### Anything else
all the time, after some time the dag is being run
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum edited a comment on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994343254
unfortunately, this keeps happening (after couple of weeks where it was running smoothly)
```
[2021-12-15 01:54:30,915] {dagbag.py:500} INFO - Filling up the DagBag from /root/learning_sets/models/
dag_bag <airflow.models.dagbag.DagBag object at 0x7f56aa88cf70>
Running <TaskInstance: download_and_preprocess_sets.download_1466 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
Running <TaskInstance: download_and_preprocess_sets.download_952 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
[2021-12-15 01:54:43,539] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-15 01:54:43,544] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
[2021-12-15 01:54:53,885] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1117612
[2021-12-15 01:54:54,098] {process_utils.py:66} INFO - Process psutil.Process(pid=1117612, status='terminated', exitcode=0, started='01:38:54') (1117612) terminated with exit code 0
[2021-12-15 01:54:54,098] {scheduler_job.py:655} INFO - Exited execute loop
[2021-12-15 01:54:54 +0100] [1117543] [INFO] Handling signal: term
[2021-12-15 01:54:54 +0100] [1117568] [INFO] Worker exiting (pid: 1117568)
[2021-12-15 01:54:54 +0100] [1117549] [INFO] Worker exiting (pid: 1117549)
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
_run_scheduler_job(args=args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
job.run()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 245, in run
self._execute()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-15 01:54:54 +0100] [1117543] [INFO] Shutting down: Master
```
I just restarted it and this are the scheduler processes running: I launched it with `screen`:
```
root@AI-Research:~# ps aux | grep airflow | grep sched | grep -v grep
root 3166838 0.0 0.0 9024 2460 ? Ss 07:17 0:00 SCREEN -L -Logfile logs/airflow_scheduler_20211215_071716.log -S airflow_scheduler -d -m airflow scheduler
root 3166845 56.1 1.0 212516 179704 pts/66 Rs+ 07:17 0:16 /usr/bin/python3 /usr/local/bin/airflow scheduler
root 3167171 0.5 0.4 111728 74756 pts/66 S 07:17 0:00 airflow scheduler -- DagFileProcessorManager
```
And this is the launching script:
```
root@AI-Research:~/learning_sets/airflow# cat launch_airflow.sh
#!/bin/bash
SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
echo SCRIPT_DIR: $SCRIPT_DIR
LEARNING_SETS_DIR="$(readlink -f $SCRIPT_DIR/..)"
echo LEARNING_SETS_DIR: $LEARNING_SETS_DIR
cd $LEARNING_SETS_DIR
TS=$(date +%Y%m%d_%H%M%S)
screen -X -S airflow_scheduler quit
screen -X -S airflow_webserver quit
sleep 1
ps aux | grep airflow | grep -v launch | awk '{print $2}' | xargs kill
sleep 1
screen -L -Logfile logs/airflow_scheduler_${TS}.log -S airflow_scheduler -d -m airflow scheduler
screen -L -Logfile logs/airflow_webserver_${TS}.log -S airflow_webserver -d -m airflow webserver
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1079965417
Unfortunately I'm still experiencing this bug with Airflow 2.2.4:
```
[2022-03-27 16:14:12,804] {scheduler_job.py:433} INFO - Setting the following tasks to queued state:
[2022-03-27 16:14:12,805] {scheduler_job.py:527} INFO - Executor reports execution of download_and_preprocess_sets.download_1544 run_id=manual__2022-03-24T13:43:29.617461+00:00 exited with status success for try_number 2
[2022-03-27 16:14:12,809] {scheduler_job.py:570} INFO - TaskInstance Finished: dag_id=download_and_preprocess_sets, task_id=download_1544, run_id=manual__2022-03-24T13:43:29.617461+00:00, run_start_date=2022-03-27 16:13:00.591762+00:00, run_end_date=2022-03-27 16:13:37.437128+00:00, run_duration=36.845366, state=success, executor_state=success, try_number=2, max_tries=2, job_id=10383, pool=default_pool, queue=default, priority_weight=59, operator=PythonOperator
[2022-03-27 16:14:12,829] {scheduler_job.py:1137} INFO - Resetting orphaned tasks for active dag runs
[2022-03-27 16:14:12,830] {scheduler_job.py:1160} INFO - Marked 1 SchedulerJob instances as failed
[2022-03-27 16:14:12,835] {scheduler_job.py:1201} INFO - Reset the following 1 orphaned TaskInstances:
<TaskInstance: download_and_preprocess_sets.download_1546 manual__2022-03-24T13:43:29.617461+00:00 [running]>
[2022-03-27 16:14:45,853] {scheduler_job.py:667} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2022-03-27 16:14:45,858] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
[2022-03-27 16:14:46,970] {process_utils.py:120} INFO - Sending Signals.SIGTERM to group 309683. PIDs of all processes in the group: [309683]
[2022-03-27 16:14:46,970] {process_utils.py:75} INFO - Sending the signal Signals.SIGTERM to group 309683
[2022-03-27 16:14:47,022] {process_utils.py:70} INFO - Process psutil.Process(pid=309683, status='terminated', exitcode=0, started='16:08:59') (309683) terminated with exit code 0
[2022-03-27 16:14:47,022] {scheduler_job.py:678} INFO - Exited execute loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
_run_scheduler_job(args=args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
job.run()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 246, in run
self._execute()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
```
I also set `concurrency=1` and `max_active_tasks=1`, but the bug keep reappearing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dwiajik commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
dwiajik commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011683020
I am experiencing the same problem and have set `schedule_after_task_execution` to `False`. The issue still persist. Do you have any suggestion? Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994816151
Thanks @stablum ! Great to see all the detailed information (and the pastebin)
@ashb - I think that one might need some closer look from both of us :) and some hypotheses on where it could come from.
I plan to take a look at that later this week and maybe we can figure out where it comes from.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum edited a comment on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994343254
unfortunately, this keeps happening (after couple of weeks where it was running smoothly)
```
[2021-12-15 01:54:30,915] {dagbag.py:500} INFO - Filling up the DagBag from /root/learning_sets/models/
dag_bag <airflow.models.dagbag.DagBag object at 0x7f56aa88cf70>
Running <TaskInstance: download_and_preprocess_sets.download_1466 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
Running <TaskInstance: download_and_preprocess_sets.download_952 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
[2021-12-15 01:54:43,539] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-15 01:54:43,544] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
[2021-12-15 01:54:53,885] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1117612
[2021-12-15 01:54:54,098] {process_utils.py:66} INFO - Process psutil.Process(pid=1117612, status='terminated', exitcode=0, started='01:38:54') (1117612) terminated with exit code 0
[2021-12-15 01:54:54,098] {scheduler_job.py:655} INFO - Exited execute loop
[2021-12-15 01:54:54 +0100] [1117543] [INFO] Handling signal: term
[2021-12-15 01:54:54 +0100] [1117568] [INFO] Worker exiting (pid: 1117568)
[2021-12-15 01:54:54 +0100] [1117549] [INFO] Worker exiting (pid: 1117549)
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
_run_scheduler_job(args=args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
job.run()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 245, in run
self._execute()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-15 01:54:54 +0100] [1117543] [INFO] Shutting down: Master
```
I just restarted it and this are the scheduler processes running: I launched it with `screen`:
```
root@AI-Research:~# ps aux | grep airflow | grep sched | grep -v grep
root 3166838 0.0 0.0 9024 2460 ? Ss 07:17 0:00 SCREEN -L -Logfile logs/airflow_scheduler_20211215_071716.log -S airflow_scheduler -d -m airflow scheduler
root 3166845 56.1 1.0 212516 179704 pts/66 Rs+ 07:17 0:16 /usr/bin/python3 /usr/local/bin/airflow scheduler
root 3167171 0.5 0.4 111728 74756 pts/66 S 07:17 0:00 airflow scheduler -- DagFileProcessorManager
```
And this is the launching script:
```
root@AI-Research:~/learning_sets/airflow# cat launch_airflow.sh
#!/bin/bash
TS=$(date +%Y%m%d_%H%M%S)
screen -X -S airflow_scheduler quit
screen -X -S airflow_webserver quit
sleep 1
ps aux | grep airflow | grep -v launch | awk '{print $2}' | xargs kill
sleep 1
screen -L -Logfile logs/airflow_scheduler_${TS}.log -S airflow_scheduler -d -m airflow scheduler
screen -L -Logfile logs/airflow_webserver_${TS}.log -S airflow_webserver -d -m airflow webserver
```
Responding to your questions:
* no other process or script is using the PostgreSQL db at the moment
* The last two (very long) queries can be read here: [https://nopaste.net/nMCqRrrUR1](https://nopaste.net/nMCqRrrUR1)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] AmarEL edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
AmarEL edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-999902054
We are facing the very same problem with Postgres.
Even being a different database, the stack trace shows the same line/method being called on the airflow "layer" before moving to the database concrete class for the #19832
```
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
```
I didn't have time to test, but there is a chance the #20030 "fixes" this one together with the #19832
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum edited a comment on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994343254
unfortunately, this keeps happening (after couple of weeks where it was running smoothly)
```
[2021-12-15 01:54:30,915] {dagbag.py:500} INFO - Filling up the DagBag from /root/learning_sets/models/
dag_bag <airflow.models.dagbag.DagBag object at 0x7f56aa88cf70>
Running <TaskInstance: download_and_preprocess_sets.download_1466 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
Running <TaskInstance: download_and_preprocess_sets.download_952 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
[2021-12-15 01:54:43,539] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-15 01:54:43,544] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
[2021-12-15 01:54:53,885] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1117612
[2021-12-15 01:54:54,098] {process_utils.py:66} INFO - Process psutil.Process(pid=1117612, status='terminated', exitcode=0, started='01:38:54') (1117612) terminated with exit code 0
[2021-12-15 01:54:54,098] {scheduler_job.py:655} INFO - Exited execute loop
[2021-12-15 01:54:54 +0100] [1117543] [INFO] Handling signal: term
[2021-12-15 01:54:54 +0100] [1117568] [INFO] Worker exiting (pid: 1117568)
[2021-12-15 01:54:54 +0100] [1117549] [INFO] Worker exiting (pid: 1117549)
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
_run_scheduler_job(args=args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
job.run()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 245, in run
self._execute()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-15 01:54:54 +0100] [1117543] [INFO] Shutting down: Master
```
I just restarted it and this are the scheduler processes running: I launched it with `screen`:
```
root@AI-Research:~# ps aux | grep airflow | grep sched | grep -v grep
root 3166838 0.0 0.0 9024 2460 ? Ss 07:17 0:00 SCREEN -L -Logfile logs/airflow_scheduler_20211215_071716.log -S airflow_scheduler -d -m airflow scheduler
root 3166845 56.1 1.0 212516 179704 pts/66 Rs+ 07:17 0:16 /usr/bin/python3 /usr/local/bin/airflow scheduler
root 3167171 0.5 0.4 111728 74756 pts/66 S 07:17 0:00 airflow scheduler -- DagFileProcessorManager
```
And this is the launching script:
```
root@AI-Research:~/learning_sets/airflow# cat launch_airflow.sh
#!/bin/bash
TS=$(date +%Y%m%d_%H%M%S)
screen -X -S airflow_scheduler quit
screen -X -S airflow_webserver quit
sleep 1
ps aux | grep airflow | grep -v launch | awk '{print $2}' | xargs kill
sleep 1
screen -L -Logfile logs/airflow_scheduler_${TS}.log -S airflow_scheduler -d -m airflow scheduler
screen -L -Logfile logs/airflow_webserver_${TS}.log -S airflow_webserver -d -m airflow webserver
```
Responding to your questions:
* no other process or script is using the PostgreSQL db at the moment
* The last two queries will be added in my next comment
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1019554113
> @dwiajik @stablum - is there any chance you have some customisations (plugins ? ) or users running DB operations (backfill? API calls? UI modifications) that might have caused the deadlock? Looking at the code, my intuition tells me that this must have been something external. Having the server logs could help to pin-point it.
I once experienced this while running a dag with many tasks. Parallelism was 64 and I have 4 dags. No concurrency limit on the dags, the tasks finishes in 20s and at one point the scheduler crashed. I will try and see if I can get it again
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ephraimbuddy closed issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
ephraimbuddy closed issue #19957:
URL: https://github.com/apache/airflow/issues/19957
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011857056
I actually spent some time few days ago looking at the mini-scheduler code but I could not really find a flaw there. The fact that it did not help you indicates that my hypothesis was unfounded, unfortunately. and maybe the reason was different (and the fact that it worked for @stablum was mainly a coincidence or some side effect of that change).
@dwiajik - it might also be that your case is a bit different - could you please report (maybe create a gist with a few examples of) some of the logs of your deadlocks - Ideally if you could send us the logs of failing scheduler and corresponding logs of the Postgres server from the same time - I believe it will be much easier to investigate if we see few examples - and the server logs shoud tell us exactly which two queries deadlocked and this should help us a lot.
What we really need is somethiing in hte /var/lib/pgsql/data/pg_log/*.log, there should be entries at the time when then deadlock happens that looks like this:
```
ERROR: deadlock detected
DETAIL: Process 21535 waits for AccessExclusiveLock on relation 342640 of database 41454; blocked by process 21506.
Process 21506 waits for AccessExclusiveLock on relation 342637 of database 41454; blocked by process 21535.
HINT: See server log for query details.
CONTEXT: SQL statement "UPDATE ..."
````
We need ideally those and some logs around it if possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1079965417
Unfortunately I'm still experiencing this bug with Airflow 2.2.4:
```
[2022-03-27 16:14:12,804] {scheduler_job.py:433} INFO - Setting the following tasks to queued state:
[2022-03-27 16:14:12,805] {scheduler_job.py:527} INFO - Executor reports execution of download_and_preprocess_sets.download_1544 run_id=manual__2022-03-24T13:43:29.617461+00:00 exited with status success for try_number 2
[2022-03-27 16:14:12,809] {scheduler_job.py:570} INFO - TaskInstance Finished: dag_id=download_and_preprocess_sets, task_id=download_1544, run_id=manual__2022-03-24T13:43:29.617461+00:00, run_start_date=2022-03-27 16:13:00.591762+00:00, run_end_date=2022-03-27 16:13:37.437128+00:00, run_duration=36.845366, state=success, executor_state=success, try_number=2, max_tries=2, job_id=10383, pool=default_pool, queue=default, priority_weight=59, operator=PythonOperator
[2022-03-27 16:14:12,829] {scheduler_job.py:1137} INFO - Resetting orphaned tasks for active dag runs
[2022-03-27 16:14:12,830] {scheduler_job.py:1160} INFO - Marked 1 SchedulerJob instances as failed
[2022-03-27 16:14:12,835] {scheduler_job.py:1201} INFO - Reset the following 1 orphaned TaskInstances:
<TaskInstance: download_and_preprocess_sets.download_1546 manual__2022-03-24T13:43:29.617461+00:00 [running]>
[2022-03-27 16:14:45,853] {scheduler_job.py:667} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2022-03-27 16:14:45,858] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
[2022-03-27 16:14:46,970] {process_utils.py:120} INFO - Sending Signals.SIGTERM to group 309683. PIDs of all processes in the group: [309683]
[2022-03-27 16:14:46,970] {process_utils.py:75} INFO - Sending the signal Signals.SIGTERM to group 309683
[2022-03-27 16:14:47,022] {process_utils.py:70} INFO - Process psutil.Process(pid=309683, status='terminated', exitcode=0, started='16:08:59') (309683) terminated with exit code 0
[2022-03-27 16:14:47,022] {scheduler_job.py:678} INFO - Exited execute loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
_run_scheduler_job(args=args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
job.run()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 246, in run
self._execute()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1079965417
Unfortunately I'm still experiencing this bug with Airflow 2.2.4:
```
[2022-03-27 16:14:12,804] {scheduler_job.py:433} INFO - Setting the following tasks to queued state:
[2022-03-27 16:14:12,805] {scheduler_job.py:527} INFO - Executor reports execution of download_and_preprocess_sets.download_1544 run_id=manual__2022-03-24T13:43:29.617461+00:00 exited with status success for try_number 2
[2022-03-27 16:14:12,809] {scheduler_job.py:570} INFO - TaskInstance Finished: dag_id=download_and_preprocess_sets, task_id=download_1544, run_id=manual__2022-03-24T13:43:29.617461+00:00, run_start_date=2022-03-27 16:13:00.591762+00:00, run_end_date=2022-03-27 16:13:37.437128+00:00, run_duration=36.845366, state=success, executor_state=success, try_number=2, max_tries=2, job_id=10383, pool=default_pool, queue=default, priority_weight=59, operator=PythonOperator
[2022-03-27 16:14:12,829] {scheduler_job.py:1137} INFO - Resetting orphaned tasks for active dag runs
[2022-03-27 16:14:12,830] {scheduler_job.py:1160} INFO - Marked 1 SchedulerJob instances as failed
[2022-03-27 16:14:12,835] {scheduler_job.py:1201} INFO - Reset the following 1 orphaned TaskInstances:
<TaskInstance: download_and_preprocess_sets.download_1546 manual__2022-03-24T13:43:29.617461+00:00 [running]>
[2022-03-27 16:14:45,853] {scheduler_job.py:667} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2022-03-27 16:14:45,858] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
[2022-03-27 16:14:46,970] {process_utils.py:120} INFO - Sending Signals.SIGTERM to group 309683. PIDs of all processes in the group: [309683]
[2022-03-27 16:14:46,970] {process_utils.py:75} INFO - Sending the signal Signals.SIGTERM to group 309683
[2022-03-27 16:14:47,022] {process_utils.py:70} INFO - Process psutil.Process(pid=309683, status='terminated', exitcode=0, started='16:08:59') (309683) terminated with exit code 0
[2022-03-27 16:14:47,022] {scheduler_job.py:678} INFO - Exited execute loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
_run_scheduler_job(args=args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
job.run()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 246, in run
self._execute()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
```
I also set `concurrency=1` and `max_active_tasks=1`, but the bug keeps reappearing
This is my PostgreSQL log:
```
2022-03-27 15:37:25.192 UTC [273303] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 15:37:25.201 UTC [300223] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 15:37:25.203 UTC [298660] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 15:37:25.205 UTC [297039] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 15:37:25.209 UTC [301829] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 15:37:26.193 UTC [276786] airflow_user@airflow_db FATAL: terminating connection due to administrator command
2022-03-27 15:39:22.382 UTC [304375] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 15:40:31.194 UTC [305340] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 15:41:38.554 UTC [306673] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 15:42:51.378 UTC [307629] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 15:43:23.546 UTC [303336] airflow_user@airflow_db ERROR: deadlock detected
2022-03-27 15:43:23.546 UTC [303336] airflow_user@airflow_db DETAIL: Process 303336 waits for ShareLock on transaction 12660394; blocked by process 309021.
Process 309021 waits for ShareLock on transaction 12660383; blocked by process 303336.
Process 303336: UPDATE task_instance SET state='scheduled' WHERE task_instance.dag_id = 'download_and_preprocess_sets' AND task_instance.run_id = 'manual__2022-03-24T13:43:29.617461+00:00' AND task_instance.task_id IN ('parse_1381', 'download_1385')
Process 309021: SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.run_id AS task_instance_run_id, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.pool_slots AS task_instance_pool_slots, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.queued_by_job_id AS task_instance_queued_by_job_id, task_
instance.pid AS task_instance_pid, task_insta
2022-03-27 15:43:23.546 UTC [303336] airflow_user@airflow_db HINT: See server log for query details.
2022-03-27 15:43:23.546 UTC [303336] airflow_user@airflow_db CONTEXT: while updating tuple (4903,44) in relation "task_instance"
2022-03-27 15:43:23.546 UTC [303336] airflow_user@airflow_db STATEMENT: UPDATE task_instance SET state='scheduled' WHERE task_instance.dag_id = 'download_and_preprocess_sets' AND task_instance.run_id = 'manual__2022-03-24T13:43:29.617461+00:00' AND task_instance.task_id IN ('parse_1381', 'download_1385')
2022-03-27 15:43:32.520 UTC [308612] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 15:43:33.596 UTC [303337] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 16:10:41.435 UTC [310498] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 16:11:54.228 UTC [311405] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 16:12:26.266 UTC [312643] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 16:13:37.542 UTC [312957] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 16:14:45.851 UTC [309686] airflow_user@airflow_db ERROR: deadlock detected
2022-03-27 16:14:45.851 UTC [309686] airflow_user@airflow_db DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
Process 309686: UPDATE task_instance SET state='scheduled' WHERE task_instance.dag_id = 'download_and_preprocess_sets' AND task_instance.run_id = 'manual__2022-03-24T13:43:29.617461+00:00' AND task_instance.task_id IN ('download_1546')
Process 314699: SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.run_id AS task_instance_run_id, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.pool_slots AS task_instance_pool_slots, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.queued_by_job_id AS task_instance_queued_by_job_id, task_
instance.pid AS task_instance_pid, task_insta
2022-03-27 16:14:45.851 UTC [309686] airflow_user@airflow_db HINT: See server log for query details.
2022-03-27 16:14:45.851 UTC [309686] airflow_user@airflow_db CONTEXT: while updating tuple (4928,21) in relation "task_instance"
2022-03-27 16:14:45.851 UTC [309686] airflow_user@airflow_db STATEMENT: UPDATE task_instance SET state='scheduled' WHERE task_instance.dag_id = 'download_and_preprocess_sets' AND task_instance.run_id = 'manual__2022-03-24T13:43:29.617461+00:00' AND task_instance.task_id IN ('download_1546')
2022-03-27 16:14:45.958 UTC [314289] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
2022-03-27 16:14:47.017 UTC [309687] airflow_user@airflow_db LOG: could not receive data from client: Connection reset by peer
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #19957:
URL: https://github.com/apache/airflow/issues/19957
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-986332738
Do you happen to have one or multiple schedulers?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-995617655
it seems that by dropping airflow's database entirely and recreating it from scratch, the bug is not re-occurring. So it might have been something in the airflow's db data.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1003544650
cc: @ashb -> do you think this is plausible hypothesis ^^ ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1007969558
>
I managed to run a large number tasks without airflow crashing, so changing that setting as you suggested did indeed help! Thank you! :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1019539469
I am afraid we need to reopen this one. IMHO #20894 has no chance of fixing the problem because it does not change Airflow behaviour really (see discussion in #20894). @dwiajik @stablum if you experience this problem still - I think we really need some server-side logs that will telll us what other query is deadlocking with this one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1019550207
@dwiajik @stablum - is there any chance you have some customisations (plugins ? ) or users running DB operations (backfill? API calls? UI modifications) that might have caused the deadlock? Looking at the code, my intuition tells me that this must have been something external. Having the server logs could help to pin-point it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1079965417
Unfortunately I'm still experiencing this bug with Airflow 2.2.4:
```
[2022-03-27 16:14:12,804] {scheduler_job.py:433} INFO - Setting the following tasks to queued state:
[2022-03-27 16:14:12,805] {scheduler_job.py:527} INFO - Executor reports execution of download_and_preprocess_sets.download_1544 run_id=manual__2022-03-24T13:43:29.617461+00:00 exited with status success for try_number 2
[2022-03-27 16:14:12,809] {scheduler_job.py:570} INFO - TaskInstance Finished: dag_id=download_and_preprocess_sets, task_id=download_1544, run_id=manual__2022-03-24T13:43:29.617461+00:00, run_start_date=2022-03-27 16:13:00.591762+00:00, run_end_date=2022-03-27 16:13:37.437128+00:00, run_duration=36.845366, state=success, executor_state=success, try_number=2, max_tries=2, job_id=10383, pool=default_pool, queue=default, priority_weight=59, operator=PythonOperator
[2022-03-27 16:14:12,829] {scheduler_job.py:1137} INFO - Resetting orphaned tasks for active dag runs
[2022-03-27 16:14:12,830] {scheduler_job.py:1160} INFO - Marked 1 SchedulerJob instances as failed
[2022-03-27 16:14:12,835] {scheduler_job.py:1201} INFO - Reset the following 1 orphaned TaskInstances:
<TaskInstance: download_and_preprocess_sets.download_1546 manual__2022-03-24T13:43:29.617461+00:00 [running]>
[2022-03-27 16:14:45,853] {scheduler_job.py:667} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2022-03-27 16:14:45,858] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
[2022-03-27 16:14:46,970] {process_utils.py:120} INFO - Sending Signals.SIGTERM to group 309683. PIDs of all processes in the group: [309683]
[2022-03-27 16:14:46,970] {process_utils.py:75} INFO - Sending the signal Signals.SIGTERM to group 309683
[2022-03-27 16:14:47,022] {process_utils.py:70} INFO - Process psutil.Process(pid=309683, status='terminated', exitcode=0, started='16:08:59') (309683) terminated with exit code 0
[2022-03-27 16:14:47,022] {scheduler_job.py:678} INFO - Exited execute loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
_run_scheduler_job(args=args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
job.run()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 246, in run
self._execute()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 651, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 732, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 815, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1072, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 901, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 309686 waits for ShareLock on transaction 12660839; blocked by process 314699.
Process 314699 waits for ShareLock on transaction 12660838; blocked by process 309686.
HINT: See server log for query details.
CONTEXT: while updating tuple (4928,21) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2022-03-24T13:43:29.617461+00:00', 'task_id_1': 'download_1546'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
```
I also set `concurrency=1` and `max_active_tasks=1`, but the bug keeps reappearing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1002095615
Just upgraded to Airflow 2.2.3, and unfortunately it keeps crashing as well
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1008022416
> I managed to run a large number tasks without airflow crashing, so changing that setting as you suggested did indeed help! Thank you! :)
Cool! Thanks for confirming! Now we need to find the root cause and fix it !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
ashb commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1004742862
Yes, that seems plausible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1005969121
@stablum @AmarEL - is it possible that you disable this configuraiton and see if it helps?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011857056
I actually spent some time few days ago looking at the mini-scheduler code but I could not really find a flaw there. The fact that it did not help you indicates that my hypothesis was unfounded, unfortunately. and maybe the reason was different (and the fact that it worked for @stablum was mainly a coincidence or some side effect of that change).
@dwiajik - it might also be that your case is a bit different - could you please report (maybe create a gist with a few examples of) some of the logs of your deadlocks - Ideally if you could send us the logs of failing scheduler and corresponding logs of the Postgres server from the same time - I believe it will be much easier to investigate if we see few examples - and the server logs shoud tell us exactly which two queries deadlocked and this should help us a lot.
What we really need is somethiing in hte /var/lib/pgsql/data/pg_log/*.log, there should be entries at the time when then deadlock happens that looks like this:
```
ERROR: deadlock detected
DETAIL: Process 21535 waits for AccessExclusiveLock on relation 342640 of database 41454; blocked by process 21506.
Process 21506 waits for AccessExclusiveLock on relation 342637 of database 41454; blocked by process 21535.
HINT: See server log for query details.
CONTEXT: SQL statement "UPDATE ..."
````
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-986335228
(Reason why I am asking - there are very low number of Postgres deadlock reports we receive, because (unlike MySQL) Postgres usually shows "real" deadlocks and we'd expect this kind of problem to appear more often amongst our users, so I am looking for any special circumstances you might have.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum edited a comment on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-986608097
Hmm, it's difficult to retrieve this bit of information. I don't think there were multiple instances of the scheduler running, but I will keep in mind this eventuality, so thank you for the hint. Anyways, I'm not experiencing the problem anymore right now, maybe related to my upgrading the system packages and rebooting. If this bug happens again I will update this thread with more info about SQL query or eventual multiple scheduler instances being run
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-986608097
Hmm, it's difficult to retrieve this bit of information. I don't think there were multiple instances of the scheduler running, but I will keep in mind this eventuality, so thank you for the hint.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994816151
Thanks @stablum ! Great to see all the detailed information (and the pastebin)
@ashb - I think that one might need some closer look from both of us :) and some hypytheses on where it could come from.
I plan to take a look at that later this week and maybe we can figure out where it comes from.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1002655342
I looked at the code and places where it can occure and I have a hypothesis on what could cause it.
And maybe some of the people here can verify my theory.
Could you please disable https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution and see if the problem still occurs?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1002655342
I looked at the code and places where it can occur and I have a hypothesis on what could cause it.
And maybe some of the people here can verify my theory.
Could you please disable https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution and see if the problem still occurs?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1002655342
I looked at the code and places where it can occure and I have a theory what could cause it.
And maybe some of people here can verify my theory.
Could you please disable https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#schedule-after-task-execution and see if the problem still occur?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1002655342
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1003544607
To bump it up - Just let us know @stablum @AmarEL if disabling mini-scheduler works.
I **think** this might be causesd by the mini-scheduler in the task deadlocking with the "actual" scheduler trying to lock the same rows in a different order. But I think before we attempt to analyse it in detail and fix it - since you have easily reproducible and repeatable case, disable the mini-scheduler it and see if it helps. If It does, it will help us to narrow down the reason and possibly fix it.
The only drawback of disabling the mini-scheduler is potentially slightly longer latency in scheduling subsequent tasks in the same dag.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] AmarEL commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
AmarEL commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-999902054
We are facing the very same problem with Postgres.
Even being a different database, the stack trace shows the same line/method being called on the airflow "layer" before moving to the database concrete class.
```
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
```
I didn't have time to test, but there is a chance the #20030 "fixes" this one together with the #19832
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dwiajik edited a comment on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
dwiajik edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-1011683020
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-986334766
Also - is there a chance to find out from the logs what was the "other" query that the deadlock happened on? Is this also possible that you have another person or process (either manuallly or via some direct DB access) acessing the DB (for example soem scripts that perform retention or similar)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994694956
Limiting concurrency does not solve the issue. Even by reducing the amount of concurrent tasks to 1 the exception is triggered after some time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum edited a comment on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum edited a comment on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994343254
unfortunately, this keeps happening (after couple of weeks where it was running smoothly)
```
[2021-12-15 01:54:30,915] {dagbag.py:500} INFO - Filling up the DagBag from /root/learning_sets/models/
dag_bag <airflow.models.dagbag.DagBag object at 0x7f56aa88cf70>
Running <TaskInstance: download_and_preprocess_sets.download_1466 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
Running <TaskInstance: download_and_preprocess_sets.download_952 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
[2021-12-15 01:54:43,539] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-15 01:54:43,544] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
[2021-12-15 01:54:53,885] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1117612
[2021-12-15 01:54:54,098] {process_utils.py:66} INFO - Process psutil.Process(pid=1117612, status='terminated', exitcode=0, started='01:38:54') (1117612) terminated with exit code 0
[2021-12-15 01:54:54,098] {scheduler_job.py:655} INFO - Exited execute loop
[2021-12-15 01:54:54 +0100] [1117543] [INFO] Handling signal: term
[2021-12-15 01:54:54 +0100] [1117568] [INFO] Worker exiting (pid: 1117568)
[2021-12-15 01:54:54 +0100] [1117549] [INFO] Worker exiting (pid: 1117549)
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
_run_scheduler_job(args=args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
job.run()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 245, in run
self._execute()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-15 01:54:54 +0100] [1117543] [INFO] Shutting down: Master
```
I just restarted it and this are the scheduler processes running: I launched it with `screen`:
```
root@AI-Research:~# ps aux | grep airflow | grep sched | grep -v grep
root 3166838 0.0 0.0 9024 2460 ? Ss 07:17 0:00 SCREEN -L -Logfile logs/airflow_scheduler_20211215_071716.log -S airflow_scheduler -d -m airflow scheduler
root 3166845 56.1 1.0 212516 179704 pts/66 Rs+ 07:17 0:16 /usr/bin/python3 /usr/local/bin/airflow scheduler
root 3167171 0.5 0.4 111728 74756 pts/66 S 07:17 0:00 airflow scheduler -- DagFileProcessorManager
```
And this is the launching script:
```
root@AI-Research:~/learning_sets/airflow# cat launch_airflow.sh
#!/bin/bash
TS=$(date +%Y%m%d_%H%M%S)
screen -X -S airflow_scheduler quit
screen -X -S airflow_webserver quit
sleep 1
ps aux | grep airflow | grep -v launch | awk '{print $2}' | xargs kill
sleep 1
screen -L -Logfile logs/airflow_scheduler_${TS}.log -S airflow_scheduler -d -m airflow scheduler
screen -L -Logfile logs/airflow_webserver_${TS}.log -S airflow_webserver -d -m airflow webserver
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum commented on issue #19957: AIrflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994343254
unfortunately, this keeps happening (after couple of weeks where it was running smoothly)
```
[2021-12-15 01:54:30,915] {dagbag.py:500} INFO - Filling up the DagBag from /root/learning_sets/models/
dag_bag <airflow.models.dagbag.DagBag object at 0x7f56aa88cf70>
Running <TaskInstance: download_and_preprocess_sets.download_1466 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
Running <TaskInstance: download_and_preprocess_sets.download_952 manual__2021-12-14T06:28:19.872227+00:00 [queued]> on host AI-Research
[2021-12-15 01:54:43,539] {scheduler_job.py:644} ERROR - Exception when executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-15 01:54:43,544] {local_executor.py:388} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
[2021-12-15 01:54:53,885] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 1117612
[2021-12-15 01:54:54,098] {process_utils.py:66} INFO - Process psutil.Process(pid=1117612, status='terminated', exitcode=0, started='01:38:54') (1117612) terminated with exit code 0
[2021-12-15 01:54:54,098] {scheduler_job.py:655} INFO - Exited execute loop
[2021-12-15 01:54:54 +0100] [1117543] [INFO] Handling signal: term
[2021-12-15 01:54:54 +0100] [1117568] [INFO] Worker exiting (pid: 1117568)
[2021-12-15 01:54:54 +0100] [1117549] [INFO] Worker exiting (pid: 1117549)
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.DeadlockDetected: deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/dist-packages/airflow/__main__.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 75, in scheduler
_run_scheduler_job(args=args)
File "/usr/local/lib/python3.9/dist-packages/airflow/cli/commands/scheduler_command.py", line 46, in _run_scheduler_job
job.run()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/base_job.py", line 245, in run
self._execute()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 628, in _execute
self._run_scheduler_loop()
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 709, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 792, in _do_scheduling
callback_to_run = self._schedule_dag_run(dag_run, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/jobs/scheduler_job.py", line 1049, in _schedule_dag_run
dag_run.schedule_tis(schedulable_tis, session)
File "/usr/local/lib/python3.9/dist-packages/airflow/utils/session.py", line 67, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/airflow/models/dagrun.py", line 898, in schedule_tis
session.query(TI)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 4063, in update
update_op.exec_()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1697, in exec_
self._do_exec()
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1895, in _do_exec
self._execute_stmt(update_stmt)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/persistence.py", line 1702, in _execute_stmt
self.result = self.query._execute_crud(stmt, self.mapper)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/orm/query.py", line 3568, in _execute_crud
return conn.execute(stmt, self._params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.9/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 1117623 waits for ShareLock on transaction 4526903; blocked by process 1206850.
Process 1206850 waits for AccessExclusiveLock on tuple (1,17) of relation 19255 of database 19096; blocked by process 1206469.
Process 1206469 waits for ShareLock on transaction 4526895; blocked by process 1117623.
HINT: See server log for query details.
CONTEXT: while updating tuple (1899,3) in relation "task_instance"
[SQL: UPDATE task_instance SET state=%(state)s WHERE task_instance.dag_id = %(dag_id_1)s AND task_instance.run_id = %(run_id_1)s AND task_instance.task_id IN (%(task_id_1)s, %(task_id_2)s, %(task_id_3)s, %(task_id_4)s, %(task_id_5)s, %(task_id_6)s, %(task_id_7)s)]
[parameters: {'state': <TaskInstanceState.SCHEDULED: 'scheduled'>, 'dag_id_1': 'download_and_preprocess_sets', 'run_id_1': 'manual__2021-12-14T06:28:19.872227+00:00', 'task_id_1': 'download_936', 'task_id_2': 'download_937', 'task_id_3': 'download_938', 'task_id_4': 'download_939', 'task_id_5': 'download_944', 'task_id_6': 'download_946', 'task_id_7': 'download_950'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2021-12-15 01:54:54 +0100] [1117543] [INFO] Shutting down: Master
```
This are the scheduler processes running: I launched it with `screen`:
```
root@AI-Research:~# ps aux | grep airflow | grep sched | grep -v grep
root 3166838 0.0 0.0 9024 2460 ? Ss 07:17 0:00 SCREEN -L -Logfile logs/airflow_scheduler_20211215_071716.log -S airflow_scheduler -d -m airflow scheduler
root 3166845 56.1 1.0 212516 179704 pts/66 Rs+ 07:17 0:16 /usr/bin/python3 /usr/local/bin/airflow scheduler
root 3167171 0.5 0.4 111728 74756 pts/66 S 07:17 0:00 airflow scheduler -- DagFileProcessorManager
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-994817635
Marked it provisionally to 2.2.4 in case we release it (it might go straight to 2.3.0 depending on how serious things there are / whether we haave a fix and how close we are to 2.3.0)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] stablum commented on issue #19957: Airflow crashes with a psycopg2.errors.DeadlockDetected exception
Posted by GitBox <gi...@apache.org>.
stablum commented on issue #19957:
URL: https://github.com/apache/airflow/issues/19957#issuecomment-995835917
> it seems that by dropping airflow's database entirely and recreating it from scratch, the bug is not re-occurring. So it might have been something in the airflow's db data.
i take this back: it's actually keeping crashing unfortunately
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org