You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/12 18:12:20 UTC

[GitHub] [airflow] karoldob opened a new issue, #26354: KubernetesPodOperator task status on multiple containers with one failed

karoldob opened a new issue, #26354:
URL: https://github.com/apache/airflow/issues/26354

   ### Apache Airflow version
   
   2.3.4
   
   ### What happened
   
   I am using KubernetesPodOperator to launch pod with multiple containers.
   When one container fail, Pod becomes in "Error" state, but task is "running".
   
   ### What you think should happen instead
   
   When Pod is in Error state, task should fail or restart (?).
   
   ### How to reproduce
   
   Simple DAG to reproduce issue - one container fails after 25s. 
   ```
   from datetime import datetime
   from kubernetes.client import models as k8s
   from airflow import DAG
   from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
   
   containers = [
       k8s.V1Container(
           name="container-1",
           image="ubuntu:16.04",
           command=[ "/bin/bash", "-c", "--" ],
           args=["while true; do sleep 30; done;"]
           ),
       k8s.V1Container(
           name="container-2",
           image="ubuntu:16.04",
           command=[ "/bin/bash", "-c", "--" ],
           args=["while true; do sleep 30; done;"]
           ),
       k8s.V1Container(
           name="container-3-failing",
           image="ubuntu:16.04",
           command=[ "/bin/bash", "-c", "--" ],
           args=[ 'for i in {1..5}; do sleep 5; echo "$i" ; done; exit 1']
           )
   ]
   
   with DAG(
       dag_id='multi_containers_one_fail',
       schedule_interval=None,
       start_date=datetime(2021, 1, 1),
   ) as dag:
       k = KubernetesPodOperator(
           namespace="airflow-test",
           name="test-pod",
           task_id="task",
           is_delete_operator_pod=True,
           full_pod_spec=k8s.V1Pod(spec=k8s.V1PodSpec(containers=containers))
       )
       k
   ```
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-cncf-kubernetes==4.3.0
   apache-airflow-providers-celery==3.0.0
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   I am using CeleryExecutor
   
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] potiuk closed issue #26354: KubernetesPodOperator task status on multiple containers with one failed

Posted by GitBox <gi...@apache.org>.

potiuk closed issue #26354: KubernetesPodOperator task status on multiple containers with one failed
URL: https://github.com/apache/airflow/issues/26354


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] potiuk commented on issue #26354: KubernetesPodOperator task status on multiple containers with one failed

Posted by GitBox <gi...@apache.org>.

potiuk commented on issue #26354:
URL: https://github.com/apache/airflow/issues/26354#issuecomment-1250393587

I believe You need to handle it in your task pod - to check if the other pods are failing as expected, and if they are not, the task should fail. This is very similar story as with "init" containers monitoring - we do not handle currently "automated" closing of such init containers automatically and the task should close them as needed (so no daemon kind of tasks).

I think there are far too many cases to handle them automatically. For example there are many cases where some containers in your pod will fail initially (for example when there is a database not started/initialized yet) and the containers will get automatically restarted and will eventually succeed. While this is not a "perfect" patterm, it does happen more often than not in K8S world. This opens all the kind of problems like "should we wait for the containers"? How long? How many retries? etc. etc.

However maybe others have a different opinion (@dstandish @jedcunningham @ephraimbuddy ?) . I will convert it into discussion in case you @karoldob want also discuss it. Maybe a new feature will be born out of it, but for sure, this is not a bug.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [airflow] boring-cyborg[bot] commented on issue #26354: KubernetesPodOperator task status on multiple containers with one failed

Posted by GitBox <gi...@apache.org>.

boring-cyborg[bot] commented on issue #26354:
URL: https://github.com/apache/airflow/issues/26354#issuecomment-1244116175

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org