You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Anish Patel (Jira)" <ji...@apache.org> on 2021/03/29 08:37:01 UTC

[jira] [Comment Edited] (AIRFLOW-6810) KubernetesPodOperator pod is completed but xcom side car is stuck

    [ https://issues.apache.org/jira/browse/AIRFLOW-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310491#comment-17310491 ] 

Anish Patel edited comment on AIRFLOW-6810 at 3/29/21, 8:36 AM:
----------------------------------------------------------------

Was this issue ever resolved? We are facing a similar (or may be even the same) issue where the worker pod created by the Kube operator remains in 'RUNNING' state for a long period of time (infinite loop) after the task script finishes execution. We had to kill the worker pod in production (using kubectl) which caused the task to unblock and the DAG to proceed to the next step. we are on version 1.10.6 and would like to understand if there is a fix available in a later version?


was (Author: anishpatel14):
Was this issue ever resolved? We are facing a similar (or may be even the same) issue where the worker pod created by the Kube operator remains in 'RUNNING' state for a long period of time (infinite loop) after the task script has finishes execution. We had to kill the worker pod in production (using kubectl) which caused the task to unblock and the DAG to proceed to the next step. we are on version 1.10.6 and would like to understand if there is a later version of airflow that fixes the issue?

> KubernetesPodOperator pod is completed but xcom side car is stuck
> -----------------------------------------------------------------
>
>                 Key: AIRFLOW-6810
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6810
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor-kubernetes
>    Affects Versions: 1.10.6
>            Reporter: Maxence Cramet
>            Assignee: Daniel Imberman
>            Priority: Major
>
> We're using KubernetesPodOperator with param xcom_push=true in order to push information from our task.
> From time to time the main pod completes but the side car pod is stuck.
> Here's the output of the pods details:
> {noformat}
> kubectl describe pod my_pod
> Name:               my_pod
> Namespace:          default
> Priority:           0
> PriorityClassName:  <none>
> Node:               xxx
> Start Time:         Wed, 05 Feb 2020 11:12:33 +0000
> Labels:             xxx
> Annotations:        xxx
> Status:             Running
> IP:                 xxx
> Containers:
>   base:
>     Container ID:  xxx
>     Image:         xxx
>     Image ID:      xxx
>     Port:          <none>
>     Host Port:     <none>
>     Args:
>       xxx
>     State:          Terminated
>       Reason:       Completed
>       Exit Code:    0
>       Started:      Wed, 05 Feb 2020 11:12:38 +0000
>       Finished:     Wed, 05 Feb 2020 11:12:47 +0000
>     Ready:          False
>     Restart Count:  0
>     Limits:
>       memory:  512Mi
>     Requests:
>       memory:  512Mi
>     Environment:
>       xxx
>     Mounts:
>       /airflow/xcom from xcom (rw)
>   airflow-xcom-sidecar:
>     Container ID:  docker://83053d7d292cda9156454ac13064d72ace1e4f72738ba9b62b04ff57cb7966cc
>     Image:         alpine
>     Image ID:      docker-pullable://alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d
>     Port:          <none>
>     Host Port:     <none>
>     Command:
>       sh
>       -c
>       trap "exit 0" INT; while true; do sleep 30; done;
>     State:          Running
>       Started:      Wed, 05 Feb 2020 11:12:40 +0000
>     Ready:          True
>     Restart Count:  0
>     Limits:
>       memory:  4Gi
>     Requests:
>       cpu:        1m
>       memory:     2Gi
>     Environment:  <none>
>     Mounts:
>       /airflow/xcom from xcom (rw)
>       xxx
> Conditions:
>   Type              Status
>   Initialized       True 
>   Ready             False 
>   ContainersReady   False 
>   PodScheduled      True 
> Volumes:
>   xcom:
>     Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
>     Medium:     
>     SizeLimit:  <unset>
>   xxx
> QoS Class:       Burstable
> Node-Selectors:  <none>
> Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
>                  node.kubernetes.io/unreachable:NoExecute for 300s
> Events:          <none>{noformat}
> I don't have more information of the possible causes of that.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)