You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Hamid Mahmood (Jira)" <ji...@apache.org> on 2020/01/22 13:40:00 UTC

[jira] [Assigned] (AIRFLOW-6551) KubernetesExecutor does not create dynamic pods for tasks inside subdag

     [ https://issues.apache.org/jira/browse/AIRFLOW-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hamid Mahmood reassigned AIRFLOW-6551:
--------------------------------------

    Assignee:     (was: Daniel Imberman)

> KubernetesExecutor does not create dynamic pods for tasks inside subdag
> -----------------------------------------------------------------------
>
>                 Key: AIRFLOW-6551
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6551
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor-kubernetes
>    Affects Versions: 1.10.7
>            Reporter: Hamid Mahmood
>            Priority: Major
>
> KubernetesExecutor does not create dynamic pods for tasks inside subdag.
> I am running airflow 1.10.7 on eks 1.14. I have multiple subdags operators inside a main_dag. Following is the hierarchy
>  * main_dag:
>  ** subdagA
>  *** taskA1
>  *** taskA2
>  *** taskA3
>  ** subdagB
>  *** subdagB_1
>  **** taskB1
>  **** taskB2
>  ** task_main1
>  ** task_main2
>  ** task_main3
> I have tested the following three scenarios.
> *Scenario:1*
> I have set the following parameter
> {code:java}
> AIRFLOW__CORE__EXECUTOR = KubernetesExecutor
> {code}
> When I run the workflow main_dag, only a single pod is created with name subdagA-
> 309c4c564b9841529236a31dfaf135c5 and all the tasks (tasksA1,taskA2,taskA3) run inside that single pod.
> In theory there should be 3 separate pods for these 3 tasks inside subdagA. 
> Same is the case for subdagB, only a single pod is created to run subdagB, subdagB_1 and its tasks runs inside that pod. 
> But the tasks (task_main1, task_main2, task_main3) that are not inside further subdag runs in dynamic pods.
>  
> *Scenario: 2*
> I have set the following
> {code:java}
> AIRFLOW__CORE__EXECUTOR = KubernetesExecutor  
> {code}
> and passed executor=KubernetesExecutor() when creating the subdag. 
> {code:java}
> SubDagOperator( task_id=task_id, subdag=subdag, dag=parent_dag, executor=KubernetesExecutor(), **kwargs )
> {code}
> Still one pod is created for SubtaskA but now the taskA1 inside subdagA got stuck in queue state, when I checked the logs of this pod I got the following
> {code:java}
> Running %s on host %s <TaskInstance: main_dag.subdagA 2020-01-12T02:15:00+00:00 [queued]> maindagsubdagA-1a31ef3e2aef489daf1b329160646{code}
> *Scenario: 3*
> I have set the following
> {code:java}
> AIRFLOW__CORE__EXECUTOR = LocalExecutor  
> {code}
> and passed executor=KubernetesExecutor() when creating the subdag.  
> {code:java}
> SubDagOperator( task_id=task_id, subdag=subdag, dag=parent_dag, executor=KubernetesExecutor(), **kwargs )
> {code}
> Now this time dynamic pods are created for single task inside subdagA and subdagB and I get the parallelism. In this case KubernetesExecutor shows the required behavior. But the downside of this is that these three tasks task_main1, task_main2 and task_main3 will use LocalExecutor and will run in scheduler pod.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)