You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/26 02:01:32 UTC

[GitHub] [airflow] smartnewsdingli opened a new issue #19832: Deadlock in worker pod

smartnewsdingli opened a new issue #19832:
URL: https://github.com/apache/airflow/issues/19832


   ### Apache Airflow version
   
   2.2.2 (latest released)
   
   ### Operating System
   
   ubuntu
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   multi schedulers. Deployed with kubernetes, and worker pod run with LocalExecutor.
   
   ### What happened
   
   The worker pod error with log
   there is a similar pr but for scheduler deadlock https://github.com/apache/airflow/pull/18975/files
   
   `[2021-11-25 23:08:18,700] {dagbag.py:500} INFO - Filling up the DagBag from /usr/local/spaas-airflow/dags/algolift/algolift_load.py
   Running <TaskInstance: algo-lift.load.wait_for_data scheduled__2021-11-22T00:00:00+00:00 [running]> on host algoliftloadwaitfordata.2388e61b4fd84b6e815f9a0f4e123ad9
   Traceback (most recent call last):
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
       cursor, statement, parameters, context
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
     File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 206, in execute
       res = self._query(query)
     File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 319, in _query
       db.query(q)
     File "/usr/local/lib/python3.7/dist-packages/MySQLdb/connections.py", line 259, in query
       _mysql.connection.query(self, query)
   MySQLdb._exceptions.OperationalError: (1213, 'Deadlock found when trying to get lock; try restarting transaction')
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File "/usr/local/lib/python3.7/dist-packages/airflow/__main__.py", line 48, in main
       args.func(args)
     File "/usr/local/lib/python3.7/dist-packages/airflow/cli/cli_parser.py", line 48, in command
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/airflow/utils/cli.py", line 92, in wrapper
       return f(*args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/airflow/cli/commands/task_command.py", line 292, in task_run
       _run_task_by_selected_method(args, dag, ti)
     File "/usr/local/lib/python3.7/dist-packages/airflow/cli/commands/task_command.py", line 105, in _run_task_by_selected_method
       _run_task_by_local_task_job(args, ti)
     File "/usr/local/lib/python3.7/dist-packages/airflow/cli/commands/task_command.py", line 163, in _run_task_by_local_task_job
       run_job.run()
     File "/usr/local/lib/python3.7/dist-packages/airflow/jobs/base_job.py", line 245, in run
       self._execute()
     File "/usr/local/lib/python3.7/dist-packages/airflow/jobs/local_task_job.py", line 97, in _execute
       external_executor_id=self.external_executor_id,
     File "/usr/local/lib/python3.7/dist-packages/airflow/utils/session.py", line 70, in wrapper
       return func(*args, session=session, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/airflow/models/taskinstance.py", line 1176, in check_and_change_state_before_execution
       self.refresh_from_db(session=session, lock_for_update=True)
     File "/usr/local/lib/python3.7/dist-packages/airflow/utils/session.py", line 67, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.7/dist-packages/airflow/models/taskinstance.py", line 729, in refresh_from_db
       ti: Optional[TaskInstance] = qry.with_for_update().first()
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/query.py", line 3429, in first
       ret = list(self[0:1])
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/query.py", line 3203, in __getitem__
       return list(res)
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
       return self._execute_and_instances(context)
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
       result = conn.execute(querycontext.statement, self._params)
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
       return meth(self, multiparams, params)
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/base.py", line 1130, in _execute_clauseelement
       distilled_params,
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
       e, statement, parameters, cursor, context
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
       sqlalchemy_exception, with_traceback=exc_info[2], from_=e
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
       cursor, statement, parameters, context
     File "/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
       cursor.execute(statement, parameters)
     File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 206, in execute
       res = self._query(query)
     File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 319, in _query
       db.query(q)
     File "/usr/local/lib/python3.7/dist-packages/MySQLdb/connections.py", line 259, in query
       _mysql.connection.query(self, query)
   sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
   [SQL: SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.run_id AS task_instance_run_id, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.pool_slots AS task_instance_pool_slots, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.queued_by_job_id AS task_instance_queued_by_job_id, task_instance.pi
 d AS task_instance_pid, task_instance.executor_config AS task_instance_executor_config, task_instance.external_executor_id AS task_instance_external_executor_id, task_instance.trigger_id AS task_instance_trigger_id, task_instance.trigger_timeout AS task_instance_trigger_timeout, task_instance.next_method AS task_instance_next_method, task_instance.next_kwargs AS task_instance_next_kwargs, dag_run_1.state AS dag_run_1_state, dag_run_1.id AS dag_run_1_id, dag_run_1.dag_id AS dag_run_1_dag_id, dag_run_1.queued_at AS dag_run_1_queued_at, dag_run_1.execution_date AS dag_run_1_execution_date, dag_run_1.start_date AS dag_run_1_start_date, dag_run_1.end_date AS dag_run_1_end_date, dag_run_1.run_id AS dag_run_1_run_id, dag_run_1.creating_job_id AS dag_run_1_creating_job_id, dag_run_1.external_trigger AS dag_run_1_external_trigger, dag_run_1.run_type AS dag_run_1_run_type, dag_run_1.conf AS dag_run_1_conf, dag_run_1.data_interval_start AS dag_run_1_data_interval_start, dag_run_1.data_interval
 _end AS dag_run_1_data_interval_end, dag_run_1.last_scheduling_decision AS dag_run_1_last_scheduling_decision, dag_run_1.dag_hash AS dag_run_1_dag_hash
   FROM task_instance INNER JOIN dag_run AS dag_run_1 ON dag_run_1.dag_id = task_instance.dag_id AND dag_run_1.run_id = task_instance.run_id
   WHERE task_instance.dag_id = %s AND task_instance.task_id = %s AND task_instance.run_id = %s
    LIMIT %s FOR UPDATE]
   [parameters: ('algo-lift.load', 'wait_for_data', 'scheduled__2021-11-22T00:00:00+00:00', 1)]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)`
   
   ### What you expected to happen
   
   Shouldn't have this error.
   
   ### How to reproduce
   
   _No response_
   
   ### Anything else
   
   This issue happened sometimes not always.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #19832: Deadlock in worker pod

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #19832:
URL: https://github.com/apache/airflow/issues/19832


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #19832: Deadlock in worker pod

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #19832:
URL: https://github.com/apache/airflow/issues/19832#issuecomment-979594271


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org