You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/06/17 17:41:08 UTC

[GitHub] [airflow] seelmann edited a comment on issue #5420: [AIRFLOW-4797] Fix zombie detection

seelmann edited a comment on issue #5420: [AIRFLOW-4797] Fix zombie detection
URL: https://github.com/apache/airflow/pull/5420#issuecomment-502779460
 
 
   Yes, correct, the `_find_zombies()` function returns zombies for all the DAGs (or an empty list within the 10 second window). But the caller of this function https://github.com/apache/airflow/blob/93de2ce7c337e6aad240a7a043a6bd3fc86f2bc2/airflow/utils/dag_processing.py#L1217 then only starts processors for `n` DAG files which receive the list of zombies, subsequent processors for other DAG files just get an empty list.
   
   The list of (all or none) zombies is passed down via `DagFileProcessor` and `SchedulerJob.process_file()` to `DagBag.kill_zombies()` https://github.com/apache/airflow/blob/93de2ce7c337e6aad240a7a043a6bd3fc86f2bc2/airflow/models/dagbag.py#L271 which then checks each zombie if it belongs to the DAG and kills it.
   
   This is far too complex for such a simple thing like detecting zombie task instances and kill them. Last Friday I debugged 5 hours to find the reason.
   
   I thought about if it's not better to remove the zombie detection from `DagFileProcessorManager` and all the passing the list around and just implement the query within `DagBag.kill_zombies()`, which can only search for it's own DAGs and there a 10 seconds delay makes sense. WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services