You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/06/17 17:34:26 UTC

[GitHub] [airflow] seelmann commented on issue #5420: [AIRFLOW-4797] Fix zombie detection

seelmann commented on issue #5420: [AIRFLOW-4797] Fix zombie detection
URL: https://github.com/apache/airflow/pull/5420#issuecomment-502779460
 
 
   Yes, correct, the `_find_zombies()` function always return zombies for all the DAGs. But the caller of this function https://github.com/apache/airflow/blob/93de2ce7c337e6aad240a7a043a6bd3fc86f2bc2/airflow/utils/dag_processing.py#L1217 then only starts processors for `n` DAG files which receive the list of zombies, subsequent processor for other DAG files just get an empty list.
   
   The list of (all or none) zombies is passed down via `DagFileProcessor` and `SchedulerJob.process_file()` to `DagBag.kill_zombies()` https://github.com/apache/airflow/blob/93de2ce7c337e6aad240a7a043a6bd3fc86f2bc2/airflow/models/dagbag.py#L271 which then checks each zombie if it belongs to the DAG and kills it.
   
   This is far too complex for such a simple thing like detecting zombie task instances and kill them. Last Friday I debugged 5 hours to find the reason.
   
   I thought about if it's not better to remove the zombie detection from `DagFileProcessorManager` and all the passing the list around and just implement the query within `DagBag.kill_zombies()`, which can only search for it's own DAGs and there a 10 seconds delay makes sense. WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services