You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/12 18:12:20 UTC
[GitHub] [airflow] karoldob opened a new issue, #26354: KubernetesPodOperator task status on multiple containers with one failed
karoldob opened a new issue, #26354:
URL: https://github.com/apache/airflow/issues/26354
### Apache Airflow version
2.3.4
### What happened
I am using KubernetesPodOperator to launch pod with multiple containers.
When one container fail, Pod becomes in "Error" state, but task is "running".
### What you think should happen instead
When Pod is in Error state, task should fail or restart (?).
### How to reproduce
Simple DAG to reproduce issue - one container fails after 25s.
```
from datetime import datetime
from kubernetes.client import models as k8s
from airflow import DAG
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
containers = [
k8s.V1Container(
name="container-1",
image="ubuntu:16.04",
command=[ "/bin/bash", "-c", "--" ],
args=["while true; do sleep 30; done;"]
),
k8s.V1Container(
name="container-2",
image="ubuntu:16.04",
command=[ "/bin/bash", "-c", "--" ],
args=["while true; do sleep 30; done;"]
),
k8s.V1Container(
name="container-3-failing",
image="ubuntu:16.04",
command=[ "/bin/bash", "-c", "--" ],
args=[ 'for i in {1..5}; do sleep 5; echo "$i" ; done; exit 1']
)
]
with DAG(
dag_id='multi_containers_one_fail',
schedule_interval=None,
start_date=datetime(2021, 1, 1),
) as dag:
k = KubernetesPodOperator(
namespace="airflow-test",
name="test-pod",
task_id="task",
is_delete_operator_pod=True,
full_pod_spec=k8s.V1Pod(spec=k8s.V1PodSpec(containers=containers))
)
k
```
### Operating System
Debian GNU/Linux 11 (bullseye)
### Versions of Apache Airflow Providers
apache-airflow-providers-cncf-kubernetes==4.3.0
apache-airflow-providers-celery==3.0.0
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
I am using CeleryExecutor
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #26354: KubernetesPodOperator task status on multiple containers with one failed
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #26354: KubernetesPodOperator task status on multiple containers with one failed
URL: https://github.com/apache/airflow/issues/26354
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #26354: KubernetesPodOperator task status on multiple containers with one failed
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #26354:
URL: https://github.com/apache/airflow/issues/26354#issuecomment-1250393587
I believe You need to handle it in your task pod - to check if the other pods are failing as expected, and if they are not, the task should fail. This is very similar story as with "init" containers monitoring - we do not handle currently "automated" closing of such init containers automatically and the task should close them as needed (so no daemon kind of tasks).
I think there are far too many cases to handle them automatically. For example there are many cases where some containers in your pod will fail initially (for example when there is a database not started/initialized yet) and the containers will get automatically restarted and will eventually succeed. While this is not a "perfect" patterm, it does happen more often than not in K8S world. This opens all the kind of problems like "should we wait for the containers"? How long? How many retries? etc. etc.
However maybe others have a different opinion (@dstandish @jedcunningham @ephraimbuddy ?) . I will convert it into discussion in case you @karoldob want also discuss it. Maybe a new feature will be born out of it, but for sure, this is not a bug.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #26354: KubernetesPodOperator task status on multiple containers with one failed
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #26354:
URL: https://github.com/apache/airflow/issues/26354#issuecomment-1244116175
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org