You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/12 20:31:33 UTC

[GitHub] [airflow] ephraimbuddy opened a new pull request #15336: Fail task when containers inside a pod fails

ephraimbuddy opened a new pull request #15336:
URL: https://github.com/apache/airflow/pull/15336


   Currently, when a container inside a pod terminates, airflow doesn't know about it and
    tasks remain queued. The kubernetes Job Watcher does not watch the status of containers
   inside pods. It only watches the pod and report the pod's status to airflow.
   
   From kubernetes doc, the pending phase of a pod is defined to include the
   time a Pod spends waiting to be scheduled as well as
   the time spent downloading container images over the network.
   
   Network failure can crash the container while the pod remains Pending until a certain
   time before it's deleted.
   
   This PR fixes this by including watching of containers in kubernetes job watcher's job
   
   This should close https://github.com/apache/airflow/issues/13542 and https://github.com/apache/airflow/issues/15218 hopefully.
   And I prefer it to timing out.
   
   cc: @jedcunningham 
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612316998



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -187,25 +190,35 @@ def process_status(
         self,
         pod_id: str,
         namespace: str,
-        status: str,
+        status: V1PodStatus,
         annotations: Dict[str, str],
         resource_version: str,
         event: Any,
     ) -> None:
         """Process status response"""
-        if status == 'Pending':
-            if event['type'] == 'DELETED':
+        pod_status = status.phase
+        if pod_status == 'Pending':
+            # Check container statuses
+            container_statuses = status.container_statuses
+            init_container_statuses = status.init_container_statuses
+            if container_statuses and self._container_image_pull_err(container_statuses):
+                self.log.info('Event: Failed to start pod %s, a container has an ErrImagePull', pod_id)
+                self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations, resource_version))

Review comment:
       While this marks the tasks as failed, the pod remains in 'Pending' state and thus not deleted. Looking for a way to change the Pod status




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612417791



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -218,6 +239,34 @@ def process_status(
                 resource_version,
             )
 
+    def process_container_statuses(
+        self,
+        pod_id: str,
+        statuses: List[Any],
+        namespace: str,
+        annotations: Dict[str, str],
+        resource_version: str,
+    ):
+        """Monitor pod container statuses"""
+        for container_status in statuses:
+            terminated = container_status.state.terminated
+            waiting = container_status.state.waiting
+            if terminated:
+                self.log.debug(
+                    "A container in the pod %s has terminated, reason: %s, message: %s",
+                    pod_id,
+                    terminated.reason,
+                    terminated.message,
+                )
+                self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations, resource_version))

Review comment:
       > should we short-circuit and return here, since we want to mark a task as Fail when any container in the POD fails right?
   
   Absolutely!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612022098



##########
File path: tests/executors/test_kubernetes_executor.py
##########
@@ -507,3 +506,113 @@ def test_process_status_catchall(self):
 
         self._run()
         self.watcher.watcher_queue.put.assert_not_called()
+
+    def test_container_status_of_terminating_fails_pod(self):
+        self.pod.status.phase = "Pending"
+        self.pod.status.container_statuses = [
+            k8s.V1ContainerStatus(
+                container_id=None,
+                image="apache/airflow:2.0.1-python3.8",
+                image_id="",
+                name="base",
+                ready="false",
+                restart_count=0,
+                state=k8s.V1ContainerState(
+                    terminated=k8s.V1ContainerStateTerminated(
+                        reason="Terminating", exit_code=1

Review comment:
       > Have you seen, or can you recreate a `phase=Pending` and `state.terminated` pod? I don't see how it is possible to have both.
   > 
   > I've tried a few scenarios with both init containers and sidecars and every case has resulted in the watcher marking it as failed (though maybe not immediately, because `phase=Running`) - however the TI still gets marked as success.
   > 
   > Said another way, I think there are bugs around here, but I don't think looking at stuff in `phase=Pending` will help?
   
   The state.terminated is of the container inside the pod. It happens. This is how you can reproduce it:
   1. Checkout this PR,
   2. Go to values.yaml and set `worker_container_repository: apache/airflow`, `worker_container_tag: 2.0.1-python3.8`
   3. Use breeze to start the cluster: `./breeze kind-cluster start`,  `./breeze kind-cluster deploy`
   4. Monitor the pods in another terminal with k9s `./breeze kind-cluster k9s`
   5. Check the scheduler logs, it will print the 'event' object at each watcher run. Inspect the object and you'll see the pod phase and container state. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r611981896



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -218,6 +239,34 @@ def process_status(
                 resource_version,
             )
 
+    def process_container_statuses(
+        self,
+        pod_id: str,
+        statuses: List[Any],
+        namespace: str,
+        annotations: Dict[str, str],
+        resource_version: str,
+    ):
+        """Monitor pod container statuses"""
+        for container_status in statuses:
+            terminated = container_status.state.terminated
+            waiting = container_status.state.waiting
+            if terminated:
+                self.log.debug(
+                    "A container in the pod %s has terminated, reason: %s, message: %s",
+                    pod_id,
+                    terminated.reason,
+                    terminated.message,
+                )
+                self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations, resource_version))

Review comment:
       I don't think we should be adding more than once to `watcher_queue`, right? It might be better to leave the queue handling to `process_status` and just return a bool, less to cart around too then.  Maybe something like this:
   
   ```
   def _has_terminated_containers(self, status: V1PodStatus) -> bool:
   ```

##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -187,25 +188,45 @@ def process_status(
         self,
         pod_id: str,
         namespace: str,
-        status: str,
+        status: Any,

Review comment:
       ```suggestion
           status: k8s.V1PodStatus,
   ```

##########
File path: tests/executors/test_kubernetes_executor.py
##########
@@ -507,3 +506,113 @@ def test_process_status_catchall(self):
 
         self._run()
         self.watcher.watcher_queue.put.assert_not_called()
+
+    def test_container_status_of_terminating_fails_pod(self):
+        self.pod.status.phase = "Pending"
+        self.pod.status.container_statuses = [
+            k8s.V1ContainerStatus(
+                container_id=None,
+                image="apache/airflow:2.0.1-python3.8",
+                image_id="",
+                name="base",
+                ready="false",
+                restart_count=0,
+                state=k8s.V1ContainerState(
+                    terminated=k8s.V1ContainerStateTerminated(
+                        reason="Terminating", exit_code=1

Review comment:
       Have you seen, or can you recreate a `phase=Pending` and `state.terminated` pod? I don't see how it is possible to have both.
   
   I've tried a few scenarios with both init containers and sidecars and every case has resulted in the watcher marking it as failed (though maybe not immediately, because `phase=Running`) - however the TI still gets marked as success.
   
   Said another way, I think there are bugs around here, but I don't think looking at stuff in `phase=Pending` will help?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612520358



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -187,25 +190,35 @@ def process_status(
         self,
         pod_id: str,
         namespace: str,
-        status: str,
+        status: V1PodStatus,
         annotations: Dict[str, str],
         resource_version: str,
         event: Any,
     ) -> None:
         """Process status response"""
-        if status == 'Pending':
-            if event['type'] == 'DELETED':
+        pod_status = status.phase
+        if pod_status == 'Pending':
+            # Check container statuses
+            container_statuses = status.container_statuses
+            init_container_statuses = status.init_container_statuses
+            if container_statuses and self._container_image_pull_err(container_statuses):
+                self.log.info('Event: Failed to start pod %s, a container has an ErrImagePull', pod_id)
+                self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations, resource_version))

Review comment:
       Interesting!!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612014040



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -218,6 +239,34 @@ def process_status(
                 resource_version,
             )
 
+    def process_container_statuses(
+        self,
+        pod_id: str,
+        statuses: List[Any],
+        namespace: str,
+        annotations: Dict[str, str],
+        resource_version: str,
+    ):
+        """Monitor pod container statuses"""
+        for container_status in statuses:
+            terminated = container_status.state.terminated
+            waiting = container_status.state.waiting
+            if terminated:
+                self.log.debug(
+                    "A container in the pod %s has terminated, reason: %s, message: %s",
+                    pod_id,
+                    terminated.reason,
+                    terminated.message,
+                )
+                self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations, resource_version))

Review comment:
       So probably:
   
   ```python
   any((container_status.state.terminated and container_status.state.terminated.exit_code == 1) for container_status in statuses)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612165116



##########
File path: tests/executors/test_kubernetes_executor.py
##########
@@ -507,3 +506,113 @@ def test_process_status_catchall(self):
 
         self._run()
         self.watcher.watcher_queue.put.assert_not_called()
+
+    def test_container_status_of_terminating_fails_pod(self):
+        self.pod.status.phase = "Pending"
+        self.pod.status.container_statuses = [
+            k8s.V1ContainerStatus(
+                container_id=None,
+                image="apache/airflow:2.0.1-python3.8",
+                image_id="",
+                name="base",
+                ready="false",
+                restart_count=0,
+                state=k8s.V1ContainerState(
+                    terminated=k8s.V1ContainerStateTerminated(
+                        reason="Terminating", exit_code=1

Review comment:
       Hi Jed, I've changed the code to only fail pod when there's image pill error as I was not able to reproduce terminating status when pod is pending.
   
   To reproduce this change, just change the `worker_container_tag` value to `201-python` so you can reproduce Image pull error and  scheduler stuck in queued forever.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612504746



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -187,25 +190,35 @@ def process_status(
         self,
         pod_id: str,
         namespace: str,
-        status: str,
+        status: V1PodStatus,

Review comment:
       Sorry, I lead you slightly astray earlier... This will lead to less imports:
   
   ```suggestion
           status: k8s.V1PodStatus,
   ```

##########
File path: tests/executors/test_kubernetes_executor.py
##########
@@ -507,3 +506,37 @@ def test_process_status_catchall(self):
 
         self._run()
         self.watcher.watcher_queue.put.assert_not_called()
+
+    def test_container_status_of_waiting_with_errimagepull_fails_pod(self):
+        self.pod.status.phase = "Pending"
+        self.pod.status.container_statuses = [
+            k8s.V1ContainerStatus(
+                container_id=None,
+                image="apache/airflow:2.0.1-python3.8",
+                image_id="",
+                name="base",
+                ready="false",
+                restart_count=0,
+                state=k8s.V1ContainerState(waiting=k8s.V1ContainerStateWaiting(reason='ErrImagePull')),
+            )
+        ]
+        self.events.append({"type": 'MODIFIED', "object": self.pod})
+        self._run()
+        self.watcher.watcher_queue.put.assert_called()

Review comment:
       ```suggestion
           self.assert_watcher_queue_called_once_with_state(State.FAILED)
   ```

##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -187,25 +190,35 @@ def process_status(
         self,
         pod_id: str,
         namespace: str,
-        status: str,
+        status: V1PodStatus,
         annotations: Dict[str, str],
         resource_version: str,
         event: Any,
     ) -> None:
         """Process status response"""
-        if status == 'Pending':
-            if event['type'] == 'DELETED':
+        pod_status = status.phase
+        if pod_status == 'Pending':
+            # Check container statuses
+            container_statuses = status.container_statuses
+            init_container_statuses = status.init_container_statuses
+            if container_statuses and self._container_image_pull_err(container_statuses):
+                self.log.info('Event: Failed to start pod %s, a container has an ErrImagePull', pod_id)
+                self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations, resource_version))

Review comment:
       Yeah, which is why #15263 deletes the pod. I think that is all we can do, unfortunately.
   
   That said, should we be failing these after the first pull failure? It is certainly something that could self-resolve and we haven't actually started the task yet 🤷‍♂️. I'm thinking a timeout is a more general catch-all for pending issues.

##########
File path: tests/executors/test_kubernetes_executor.py
##########
@@ -507,3 +506,37 @@ def test_process_status_catchall(self):
 
         self._run()
         self.watcher.watcher_queue.put.assert_not_called()
+
+    def test_container_status_of_waiting_with_errimagepull_fails_pod(self):
+        self.pod.status.phase = "Pending"
+        self.pod.status.container_statuses = [
+            k8s.V1ContainerStatus(
+                container_id=None,
+                image="apache/airflow:2.0.1-python3.8",
+                image_id="",
+                name="base",
+                ready="false",
+                restart_count=0,
+                state=k8s.V1ContainerState(waiting=k8s.V1ContainerStateWaiting(reason='ErrImagePull')),
+            )
+        ]
+        self.events.append({"type": 'MODIFIED', "object": self.pod})
+        self._run()
+        self.watcher.watcher_queue.put.assert_called()
+
+    def test_init_container_status_of_waiting_with_errimagepull_fails_pod(self):
+        self.pod.status.phase = "Pending"
+        self.pod.status.init_container_statuses = [
+            k8s.V1ContainerStatus(
+                container_id=None,
+                image="apache/airflow:2.0.1-python3.8",
+                image_id="",
+                name="base",
+                ready="false",
+                restart_count=0,
+                state=k8s.V1ContainerState(waiting=k8s.V1ContainerStateWaiting(reason='ErrImagePull')),
+            )
+        ]
+        self.events.append({"type": 'MODIFIED', "object": self.pod})
+        self._run()
+        self.watcher.watcher_queue.put.assert_called()

Review comment:
       ```suggestion
           self.assert_watcher_queue_called_once_with_state(State.FAILED)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
dimberman commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612492878



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -187,25 +190,35 @@ def process_status(
         self,
         pod_id: str,
         namespace: str,
-        status: str,
+        status: V1PodStatus,
         annotations: Dict[str, str],
         resource_version: str,
         event: Any,
     ) -> None:
         """Process status response"""
-        if status == 'Pending':
-            if event['type'] == 'DELETED':
+        pod_status = status.phase
+        if pod_status == 'Pending':
+            # Check container statuses
+            container_statuses = status.container_statuses
+            init_container_statuses = status.init_container_statuses
+            if container_statuses and self._container_image_pull_err(container_statuses):
+                self.log.info('Event: Failed to start pod %s, a container has an ErrImagePull', pod_id)
+                self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations, resource_version))

Review comment:
       @ephraimbuddy I don't think you can change the pod status from inside of Airflow. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy closed pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
ephraimbuddy closed pull request #15336:
URL: https://github.com/apache/airflow/pull/15336


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612316998



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -187,25 +190,35 @@ def process_status(
         self,
         pod_id: str,
         namespace: str,
-        status: str,
+        status: V1PodStatus,
         annotations: Dict[str, str],
         resource_version: str,
         event: Any,
     ) -> None:
         """Process status response"""
-        if status == 'Pending':
-            if event['type'] == 'DELETED':
+        pod_status = status.phase
+        if pod_status == 'Pending':
+            # Check container statuses
+            container_statuses = status.container_statuses
+            init_container_statuses = status.init_container_statuses
+            if container_statuses and self._container_image_pull_err(container_statuses):
+                self.log.info('Event: Failed to start pod %s, a container has an ErrImagePull', pod_id)
+                self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations, resource_version))

Review comment:
       While this marks the tasks as failed, the pod remains in 'Pending' state and thus not deleted even with delete_worker_pods set to True. Looking for a way to change the Pod status




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612010640



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -218,6 +239,34 @@ def process_status(
                 resource_version,
             )
 
+    def process_container_statuses(
+        self,
+        pod_id: str,
+        statuses: List[Any],
+        namespace: str,
+        annotations: Dict[str, str],
+        resource_version: str,
+    ):
+        """Monitor pod container statuses"""
+        for container_status in statuses:
+            terminated = container_status.state.terminated
+            waiting = container_status.state.waiting
+            if terminated:
+                self.log.debug(
+                    "A container in the pod %s has terminated, reason: %s, message: %s",
+                    pod_id,
+                    terminated.reason,
+                    terminated.message,
+                )
+                self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations, resource_version))

Review comment:
       You should probably check exit_code too.
   
   Code: https://github.com/kubernetes-client/python/blob/v11.0.0/kubernetes/client/models/v1_container_state_terminated.py#L67




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15336: Fail task when containers inside a pod fails

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15336:
URL: https://github.com/apache/airflow/pull/15336#discussion_r612007522



##########
File path: airflow/executors/kubernetes_executor.py
##########
@@ -218,6 +239,34 @@ def process_status(
                 resource_version,
             )
 
+    def process_container_statuses(
+        self,
+        pod_id: str,
+        statuses: List[Any],
+        namespace: str,
+        annotations: Dict[str, str],
+        resource_version: str,
+    ):
+        """Monitor pod container statuses"""
+        for container_status in statuses:
+            terminated = container_status.state.terminated
+            waiting = container_status.state.waiting
+            if terminated:
+                self.log.debug(
+                    "A container in the pod %s has terminated, reason: %s, message: %s",
+                    pod_id,
+                    terminated.reason,
+                    terminated.message,
+                )
+                self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations, resource_version))

Review comment:
       should we short-circuit and return here, since we want to mark a task as Fail when any container in the POD fails right?
   
   We could also probably do
   
   ```python
   any(container_status.state.terminated for container_status in statuses)
   ```
   
   However, "a terminated container" != "failed container"
   
   >A container in the Terminated state began execution and then either ran to completion or failed for some reason. When you use kubectl to query a Pod with a container that is Terminated, you see a reason, an exit code, and the start and finish time for that container's period of execution.
   
   From: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-state-terminated
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org