You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "afusr (Jira)" <ji...@apache.org> on 2019/11/19 10:52:00 UTC
[jira] [Updated] (AIRFLOW-6014) Kubernetes executor - handle preempted deleted pods - queued tasks

     [ https://issues.apache.org/jira/browse/AIRFLOW-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

afusr updated AIRFLOW-6014:
---------------------------
    Description: 
We have encountered an issue whereby when using the kubernetes executor, and using autoscaling, airflow pods are preempted and airflow never attempts to rerun these pods. 

This is partly as a result of having the following set on the pod spec:

restartPolicy: Never

This makes sense as if a pod fails when running a task, we don't want kubernetes to retry it, as this should be controlled by airflow. 

What we believe happens is that when a new node is added by autoscaling, kubernetes schedules a number of airflow pods onto the new node, as well as any pods required by k8s/daemon sets. As these are higher priority, the Airflow pods are preempted, and deleted. You see messages such as:

 

Preempted by kube-system/ip-masq-agent-xz77q on node gke-some--airflow-00000000-node-1ltl

 

Within the kubernetes executor, these pods end up in a status of pending and an event of deleted is received by not handled. 

The end result is tasks remain in a queued state forever. 

 

  was:
We have encountered an issue whereby when using the kubernetes executor, and using autoscaling, airflow pods are preempted and airflow never attempts to rerun these pods. 

This is partly as a result of having the following set on the pod spec:

restartPolicy: Never

This makes sense as if a pod fails when running a task, we don't want kubernetes to retry it, as this should be controlled by airflow. 

What we believe happens is that when a new node is added by autoscaling, kubernetes schedules a number of airflow pods onto the new, as well as any pods required by k8s/daemon sets. As these are higher priority, the Airflow pods are preempted, and deleted. You see messages such as:

 

Preempted by kube-system/ip-masq-agent-xz77q on node gke-some--airflow-00000000-node-1ltl

 

Within the kubernetes executor, these pods end up in a status of pending and an event of deleted is received by not handled. 

The end result is tasks remain in a queued state forever. 

 


> Kubernetes executor - handle preempted deleted pods - queued tasks
> ------------------------------------------------------------------
>
>                 Key: AIRFLOW-6014
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6014
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: executor-kubernetes
>    Affects Versions: 1.10.6
>            Reporter: afusr
>            Assignee: Daniel Imberman
>            Priority: Minor
>
> We have encountered an issue whereby when using the kubernetes executor, and using autoscaling, airflow pods are preempted and airflow never attempts to rerun these pods. 
> This is partly as a result of having the following set on the pod spec:
> restartPolicy: Never
> This makes sense as if a pod fails when running a task, we don't want kubernetes to retry it, as this should be controlled by airflow. 
> What we believe happens is that when a new node is added by autoscaling, kubernetes schedules a number of airflow pods onto the new node, as well as any pods required by k8s/daemon sets. As these are higher priority, the Airflow pods are preempted, and deleted. You see messages such as:
>  
> Preempted by kube-system/ip-masq-agent-xz77q on node gke-some--airflow-00000000-node-1ltl
>  
> Within the kubernetes executor, these pods end up in a status of pending and an event of deleted is received by not handled. 
> The end result is tasks remain in a queued state forever. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)