You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/14 08:14:27 UTC

[GitHub] [airflow] art-i-svsg opened a new issue #10325: KubernetesPodOperator is still running running pod but task is marked as failed

art-i-svsg opened a new issue #10325:
URL: https://github.com/apache/airflow/issues/10325


   
   
   **Apache Airflow version**: 1.10.9
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): 1.14
   
   **Environment**: 
   
   - **Cloud provider or hardware configuration**: AWS EKS
   - **Install tools**: Helm version 3
   - **Others**: Helm chart - https://hub.helm.sh/charts/stable/airflow version 6.10.4
   
   **What happened**:
   
   We have Airflow set up with Celery executor but our tasks implemented using KubernetesPodExecutor. We create dag runs with set of tasks and run them as pods. We have tasks that can run for 40 minutes and more. Pretty often, we see that task is still running, actively doing required operations, but airflow marks task as failed, and retries it or if there are no retries left it just marks it as failed. Sometimes pods are stuck in running, though task is showing succeed status. We currently have one worker pod, which basically starts tasks execution, and we started to notice that worker goes OOMKilled pretty often because of low memory. Sometimes though tasks run just fine. 
   
   This might be related to this bug: https://issues.apache.org/jira/browse/AIRFLOW-6580
   
   **What you expected to happen**:
   
   We expect pod to run as long as needed, and task reflect real status of the underlying pod. 
   
   **Anything else we need to know**:
   
   We have tasks that run every night, and it happens either every day to 2-3 tasks, or every other day. Sometimes it runs just fine.
   
   This really impacts our production services and any help is highly appreciated!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal edited a comment on issue #10325: KubernetesPodOperator is still running running pod but task is marked as failed

Posted by GitBox <gi...@apache.org>.
eladkal edited a comment on issue #10325:
URL: https://github.com/apache/airflow/issues/10325#issuecomment-931567746


   This may have been solved by https://github.com/apache/airflow/pull/10230
   If the issue still happens on latest airflow version and kubernetes provider let us know
   closing for now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #10325: KubernetesPodOperator is still running running pod but task is marked as failed

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #10325:
URL: https://github.com/apache/airflow/issues/10325#issuecomment-673951256


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #10325: KubernetesPodOperator is still running running pod but task is marked as failed

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #10325:
URL: https://github.com/apache/airflow/issues/10325#issuecomment-931567746


   This might have been solved by https://github.com/apache/airflow/pull/10230
   If the issue still happens on latest airflow version and kubernetes provider let us know


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal closed issue #10325: KubernetesPodOperator is still running running pod but task is marked as failed

Posted by GitBox <gi...@apache.org>.
eladkal closed issue #10325:
URL: https://github.com/apache/airflow/issues/10325


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org