You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Chetan Khatri <ch...@gmail.com> on 2019/05/03 19:25:28 UTC

Deadlock at DAG SubDAG Execution at LocalExecutor

Hello Airflow Dev,

I am using Airflow to schedule, orchestrate and monitor Data pipelines. My
airflow.cfg is default, I haven't change any attribute value yet.

*Main Dag:*

dag = DAG(
    dag_id=DAG_NAME,
    default_args=args,
    schedule_interval=None,
    concurrency=8
)

I have 8 another sub-dags with similar setting:

group_one_parquet = SubDagOperator(
    executor=LocalExecutor(),
    task_id='group_one_parquet',
    subdag=group_one_parquet_subdag(DAG_NAME, 'group_one_parquet' , args,
network_id,schema_name, env_name),
    default_args=args,
    dag=dag,
)

Now when I am running this same DAG for let's say 5 times parallely with
different data points passed explicitly with **kwargs.

I am getting an error as Deadlock!,
1) The maximum number of running tasks (8) for this task's DAG 'xya' has
reached
2) BackfillJob is deadlocked at log.

[image: image (3).png]

I would like to Scale LocalExecutor as vertically only because of certain
limitation.

Can someone please throw light from experience.

Thanks

Re: Deadlock at DAG SubDAG Execution at LocalExecutor

Posted by Chetan Khatri <ch...@gmail.com>.
Thanks airflowuser for response.

Hello Bolke,

Do you any observations...

On Sat, May 4, 2019 at 1:01 AM Chetan Khatri <ch...@gmail.com>
wrote:

> What I am guessing is,
>
> 1) Increase the concurrency to 56 at airflow.cfg and don't overwrite
> attribute in DAG Constructor.
> 2) Change subdag's LocalExecutor to sequentialexecutor.
>
> On Sat, May 4, 2019 at 12:55 AM Chetan Khatri <ch...@gmail.com>
> wrote:
>
>> Hello Airflow Dev,
>>
>> I am using Airflow to schedule, orchestrate and monitor Data pipelines.
>> My airflow.cfg is default, I haven't change any attribute value yet.
>>
>> *Main Dag:*
>>
>> dag = DAG(
>>     dag_id=DAG_NAME,
>>     default_args=args,
>>     schedule_interval=None,
>>     concurrency=8
>> )
>>
>> I have 8 another sub-dags with similar setting:
>>
>> group_one_parquet = SubDagOperator(
>>     executor=LocalExecutor(),
>>     task_id='group_one_parquet',
>>     subdag=group_one_parquet_subdag(DAG_NAME, 'group_one_parquet' , args,
>> network_id,schema_name, env_name),
>>     default_args=args,
>>     dag=dag,
>> )
>>
>> Now when I am running this same DAG for let's say 5 times parallely with
>> different data points passed explicitly with **kwargs.
>>
>> I am getting an error as Deadlock!,
>> 1) The maximum number of running tasks (8) for this task's DAG 'xya' has
>> reached
>> 2) BackfillJob is deadlocked at log.
>>
>> [image: image (3).png]
>>
>> I would like to Scale LocalExecutor as vertically only because of certain
>> limitation.
>>
>> Can someone please throw light from experience.
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>

Re: Deadlock at DAG SubDAG Execution at LocalExecutor

Posted by airflowuser <ai...@protonmail.com.INVALID>.
Subdags can be deadlocked when using local executor.
This is why the default is sequential executor and there a warning about this in the code which is discussed in the stale PR about the issue:
https://github.com/apache/airflow/pull/2367

Sent with [ProtonMail](https://protonmail.com) Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, May 3, 2019 10:31 PM, Chetan Khatri <ch...@gmail.com> wrote:

> What I am guessing is,
>
> 1) Increase the concurrency to 56 at airflow.cfg and don't overwrite attribute in DAG Constructor.
> 2) Change subdag's LocalExecutor to sequentialexecutor.
>
> On Sat, May 4, 2019 at 12:55 AM Chetan Khatri <ch...@gmail.com> wrote:
>
>> Hello Airflow Dev,
>>
>> I am using Airflow to schedule, orchestrate and monitor Data pipelines. My airflow.cfg is default, I haven't change any attribute value yet.
>>
>> Main Dag:
>>
>> dag = DAG(
>>     dag_id=DAG_NAME,
>>     default_args=args,
>>     schedule_interval=None,
>>     concurrency=8
>> )
>>
>> I have 8 another sub-dags with similar setting:
>>
>> group_one_parquet = SubDagOperator(
>>     executor=LocalExecutor(),
>>     task_id='group_one_parquet',
>>     subdag=group_one_parquet_subdag(DAG_NAME, 'group_one_parquet' , args, network_id,schema_name, env_name),
>>     default_args=args,
>>     dag=dag,
>> )
>>
>> Now when I am running this same DAG for let's say 5 times parallely with different data points passed explicitly with **kwargs.
>>
>> I am getting an error as Deadlock!,
>> 1) The maximum number of running tasks (8) for this task's DAG 'xya' has reached
>> 2) BackfillJob is deadlocked at log.
>>
>> I would like to Scale LocalExecutor as vertically only because of certain limitation.
>>
>> Can someone please throw light from experience.
>>
>> Thanks

Re: Deadlock at DAG SubDAG Execution at LocalExecutor

Posted by Chetan Khatri <ch...@gmail.com>.
What I am guessing is,

1) Increase the concurrency to 56 at airflow.cfg and don't overwrite
attribute in DAG Constructor.
2) Change subdag's LocalExecutor to sequentialexecutor.

On Sat, May 4, 2019 at 12:55 AM Chetan Khatri <ch...@gmail.com>
wrote:

> Hello Airflow Dev,
>
> I am using Airflow to schedule, orchestrate and monitor Data pipelines. My
> airflow.cfg is default, I haven't change any attribute value yet.
>
> *Main Dag:*
>
> dag = DAG(
>     dag_id=DAG_NAME,
>     default_args=args,
>     schedule_interval=None,
>     concurrency=8
> )
>
> I have 8 another sub-dags with similar setting:
>
> group_one_parquet = SubDagOperator(
>     executor=LocalExecutor(),
>     task_id='group_one_parquet',
>     subdag=group_one_parquet_subdag(DAG_NAME, 'group_one_parquet' , args,
> network_id,schema_name, env_name),
>     default_args=args,
>     dag=dag,
> )
>
> Now when I am running this same DAG for let's say 5 times parallely with
> different data points passed explicitly with **kwargs.
>
> I am getting an error as Deadlock!,
> 1) The maximum number of running tasks (8) for this task's DAG 'xya' has
> reached
> 2) BackfillJob is deadlocked at log.
>
> [image: image (3).png]
>
> I would like to Scale LocalExecutor as vertically only because of certain
> limitation.
>
> Can someone please throw light from experience.
>
> Thanks
>
>
>
>
>
>
>