You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/26 23:48:09 UTC
[GitHub] [airflow] vchiapaikeo commented on issue #27296: Task completes work but ends up failing due to a lock wait timeout exceeded error and does not honor retries
vchiapaikeo commented on issue #27296:
URL: https://github.com/apache/airflow/issues/27296#issuecomment-1292782722
Adding a bit more analysis here. I'm noticing the only place where a query like this (SQL: UPDATE dag_run SET last_scheduling_decision=%s WHERE dag_run.id = %s) would be run is here in `DagRun.update_state`:
https://github.com/apache/airflow/blob/2.4.2/airflow/models/dagrun.py#L516-L518
Specifically, `last_scheduling_decision` gets set here:
https://github.com/apache/airflow/blob/2.4.2/airflow/models/dagrun.py#L552
And I think the most likely place that `DagRun.update_state` is being called from is here in `SchedulerJob._schedule_dag_run`:
https://github.com/apache/airflow/blob/2.4.2/airflow/jobs/scheduler_job.py#L1242-L1246
https://github.com/apache/airflow/blob/2.4.2/airflow/jobs/scheduler_job.py#L1301
What I don't quite understand is if this is a call from SchedulerJob to update the dagrun, why are we seeing these logs on the worker pod? Is that because we're using KubernetesExecutor and the airflow worker pod itself is actually run as LocalExecutor? I also wonder what could be holding a lock on this same record for >50s...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org