You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Daniel Imberman (Jira)" <ji...@apache.org> on 2019/09/12 17:27:00 UTC

[jira] [Updated] (AIRFLOW-4730) Startup-timeout for launching pods on k8s executor/operator should be configurable

     [ https://issues.apache.org/jira/browse/AIRFLOW-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Imberman updated AIRFLOW-4730:
-------------------------------------
    Description: 
!Screen Shot 2019-06-04 at 9.04.31 AM.png|width=984,height=203!

Currently users that have affinities for their DAGs are getting failures when k8s fails to schedule due to lack of available nodes.

It looks like this might have to do with the fact that the k8s executor uses run_pod_async meaning that it attempts it once and then fails on any failure from the API. Could probably add logic to read the API section for affinity failures

[https://github.com/apache/airflow/blob/05c06b0f6669f677495328c68c2bd05f6d0e69db/airflow/kubernetes/pod_launcher.py#L59]

  was:
!Screen Shot 2019-06-04 at 9.04.31 AM.png!

Currently users that have affinities for their DAGs are getting failures when k8s fails to schedule due to lack of available nodes.



It looks like this might have to do with the fact that the k8s executor uses run_pod_async meaning that it attempts it once and then fails on any failure from the API. Could probably add logic to read the API section for affinity failures

https://github.com/apache/airflow/blob/05c06b0f6669f677495328c68c2bd05f6d0e69db/airflow/kubernetes/pod_launcher.py#L59


> Startup-timeout for launching pods on k8s executor/operator should be configurable
> ----------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4730
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4730
>             Project: Apache Airflow
>          Issue Type: Task
>          Components: executors
>    Affects Versions: 1.10.3
>            Reporter: Daniel Imberman
>            Priority: Minor
>              Labels: beginner, kubernetes, starter
>         Attachments: Screen Shot 2019-06-04 at 9.04.31 AM.png
>
>
> !Screen Shot 2019-06-04 at 9.04.31 AM.png|width=984,height=203!
> Currently users that have affinities for their DAGs are getting failures when k8s fails to schedule due to lack of available nodes.
> It looks like this might have to do with the fact that the k8s executor uses run_pod_async meaning that it attempts it once and then fails on any failure from the API. Could probably add logic to read the API section for affinity failures
> [https://github.com/apache/airflow/blob/05c06b0f6669f677495328c68c2bd05f6d0e69db/airflow/kubernetes/pod_launcher.py#L59]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)