You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/10/14 22:17:00 UTC

[jira] [Commented] (AIRFLOW-4797) Zombie detection and killing is not deterministic

    [ https://issues.apache.org/jira/browse/AIRFLOW-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951389#comment-16951389 ] 

ASF GitHub Bot commented on AIRFLOW-4797:
-----------------------------------------

KevinYang21 commented on pull request #5908: Revert "[AIRFLOW-4797] Improve performance and behaviour of zombie de…
URL: https://github.com/apache/airflow/pull/5908
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Zombie detection and killing is not deterministic
> -------------------------------------------------
>
>                 Key: AIRFLOW-4797
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4797
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.3
>            Reporter: Stefan Seelmann
>            Assignee: Stefan Seelmann
>            Priority: Major
>             Fix For: 1.10.4
>
>
> Zombie detection and killing is done within the DAG file processing loop. Within one iteration only a subset of the DAG files are processed (config scheduler.max_threads). The loop sleeps for the rest of the second, until the next iteration runs which processes the next subset of DAG files. The function to get zombie task instancs only returns zombies once within 10 seconds, otherwise an empty list is returned.
> That means only in every 10th iteration of the DAG file processing loop zombies are detected. And only if the zombie task belong to one of the DAG files of the current iteration they are killed.
> We run into the worst case scenario with max_threads=2 and 20 DAGs. In such a scenario only zombies of the same 2 DAGs are killed. (as loop iterations are not exactly 1s it shifts slowly and eventually the zomies are killed, but in one example it took 33 minutes).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)