You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/26 23:48:09 UTC

[GitHub] [airflow] vchiapaikeo commented on issue #27296: Task completes work but ends up failing due to a lock wait timeout exceeded error and does not honor retries

vchiapaikeo commented on issue #27296:
URL: https://github.com/apache/airflow/issues/27296#issuecomment-1292782722

   Adding a bit more analysis here. I'm noticing the only place where a query like this (SQL: UPDATE dag_run SET last_scheduling_decision=%s WHERE dag_run.id = %s) would be run is here in `DagRun.update_state`:
   
   https://github.com/apache/airflow/blob/2.4.2/airflow/models/dagrun.py#L516-L518
   
   Specifically, `last_scheduling_decision` gets set here:
   
   https://github.com/apache/airflow/blob/2.4.2/airflow/models/dagrun.py#L552
   
   And I think the most likely place that `DagRun.update_state` is being called from is here in `SchedulerJob._schedule_dag_run`:
   
   https://github.com/apache/airflow/blob/2.4.2/airflow/jobs/scheduler_job.py#L1242-L1246
   https://github.com/apache/airflow/blob/2.4.2/airflow/jobs/scheduler_job.py#L1301
   
   What I don't quite understand is if this is a call from SchedulerJob to update the dagrun, why are we seeing these logs on the worker pod? Is that because we're using KubernetesExecutor and the airflow worker pod itself is actually run as LocalExecutor? I also wonder what could be holding a lock on this same record for >50s...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org