You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Chetan Khatri <ch...@gmail.com> on 2019/05/03 19:25:28 UTC
Deadlock at DAG SubDAG Execution at LocalExecutor
Hello Airflow Dev,
I am using Airflow to schedule, orchestrate and monitor Data pipelines. My
airflow.cfg is default, I haven't change any attribute value yet.
*Main Dag:*
dag = DAG(
dag_id=DAG_NAME,
default_args=args,
schedule_interval=None,
concurrency=8
)
I have 8 another sub-dags with similar setting:
group_one_parquet = SubDagOperator(
executor=LocalExecutor(),
task_id='group_one_parquet',
subdag=group_one_parquet_subdag(DAG_NAME, 'group_one_parquet' , args,
network_id,schema_name, env_name),
default_args=args,
dag=dag,
)
Now when I am running this same DAG for let's say 5 times parallely with
different data points passed explicitly with **kwargs.
I am getting an error as Deadlock!,
1) The maximum number of running tasks (8) for this task's DAG 'xya' has
reached
2) BackfillJob is deadlocked at log.
[image: image (3).png]
I would like to Scale LocalExecutor as vertically only because of certain
limitation.
Can someone please throw light from experience.
Thanks
Re: Deadlock at DAG SubDAG Execution at LocalExecutor
Posted by Chetan Khatri <ch...@gmail.com>.
Thanks airflowuser for response.
Hello Bolke,
Do you any observations...
On Sat, May 4, 2019 at 1:01 AM Chetan Khatri <ch...@gmail.com>
wrote:
> What I am guessing is,
>
> 1) Increase the concurrency to 56 at airflow.cfg and don't overwrite
> attribute in DAG Constructor.
> 2) Change subdag's LocalExecutor to sequentialexecutor.
>
> On Sat, May 4, 2019 at 12:55 AM Chetan Khatri <ch...@gmail.com>
> wrote:
>
>> Hello Airflow Dev,
>>
>> I am using Airflow to schedule, orchestrate and monitor Data pipelines.
>> My airflow.cfg is default, I haven't change any attribute value yet.
>>
>> *Main Dag:*
>>
>> dag = DAG(
>> dag_id=DAG_NAME,
>> default_args=args,
>> schedule_interval=None,
>> concurrency=8
>> )
>>
>> I have 8 another sub-dags with similar setting:
>>
>> group_one_parquet = SubDagOperator(
>> executor=LocalExecutor(),
>> task_id='group_one_parquet',
>> subdag=group_one_parquet_subdag(DAG_NAME, 'group_one_parquet' , args,
>> network_id,schema_name, env_name),
>> default_args=args,
>> dag=dag,
>> )
>>
>> Now when I am running this same DAG for let's say 5 times parallely with
>> different data points passed explicitly with **kwargs.
>>
>> I am getting an error as Deadlock!,
>> 1) The maximum number of running tasks (8) for this task's DAG 'xya' has
>> reached
>> 2) BackfillJob is deadlocked at log.
>>
>> [image: image (3).png]
>>
>> I would like to Scale LocalExecutor as vertically only because of certain
>> limitation.
>>
>> Can someone please throw light from experience.
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>
Re: Deadlock at DAG SubDAG Execution at LocalExecutor
Posted by airflowuser <ai...@protonmail.com.INVALID>.
Subdags can be deadlocked when using local executor.
This is why the default is sequential executor and there a warning about this in the code which is discussed in the stale PR about the issue:
https://github.com/apache/airflow/pull/2367
Sent with [ProtonMail](https://protonmail.com) Secure Email.
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, May 3, 2019 10:31 PM, Chetan Khatri <ch...@gmail.com> wrote:
> What I am guessing is,
>
> 1) Increase the concurrency to 56 at airflow.cfg and don't overwrite attribute in DAG Constructor.
> 2) Change subdag's LocalExecutor to sequentialexecutor.
>
> On Sat, May 4, 2019 at 12:55 AM Chetan Khatri <ch...@gmail.com> wrote:
>
>> Hello Airflow Dev,
>>
>> I am using Airflow to schedule, orchestrate and monitor Data pipelines. My airflow.cfg is default, I haven't change any attribute value yet.
>>
>> Main Dag:
>>
>> dag = DAG(
>> dag_id=DAG_NAME,
>> default_args=args,
>> schedule_interval=None,
>> concurrency=8
>> )
>>
>> I have 8 another sub-dags with similar setting:
>>
>> group_one_parquet = SubDagOperator(
>> executor=LocalExecutor(),
>> task_id='group_one_parquet',
>> subdag=group_one_parquet_subdag(DAG_NAME, 'group_one_parquet' , args, network_id,schema_name, env_name),
>> default_args=args,
>> dag=dag,
>> )
>>
>> Now when I am running this same DAG for let's say 5 times parallely with different data points passed explicitly with **kwargs.
>>
>> I am getting an error as Deadlock!,
>> 1) The maximum number of running tasks (8) for this task's DAG 'xya' has reached
>> 2) BackfillJob is deadlocked at log.
>>
>> I would like to Scale LocalExecutor as vertically only because of certain limitation.
>>
>> Can someone please throw light from experience.
>>
>> Thanks
Re: Deadlock at DAG SubDAG Execution at LocalExecutor
Posted by Chetan Khatri <ch...@gmail.com>.
What I am guessing is,
1) Increase the concurrency to 56 at airflow.cfg and don't overwrite
attribute in DAG Constructor.
2) Change subdag's LocalExecutor to sequentialexecutor.
On Sat, May 4, 2019 at 12:55 AM Chetan Khatri <ch...@gmail.com>
wrote:
> Hello Airflow Dev,
>
> I am using Airflow to schedule, orchestrate and monitor Data pipelines. My
> airflow.cfg is default, I haven't change any attribute value yet.
>
> *Main Dag:*
>
> dag = DAG(
> dag_id=DAG_NAME,
> default_args=args,
> schedule_interval=None,
> concurrency=8
> )
>
> I have 8 another sub-dags with similar setting:
>
> group_one_parquet = SubDagOperator(
> executor=LocalExecutor(),
> task_id='group_one_parquet',
> subdag=group_one_parquet_subdag(DAG_NAME, 'group_one_parquet' , args,
> network_id,schema_name, env_name),
> default_args=args,
> dag=dag,
> )
>
> Now when I am running this same DAG for let's say 5 times parallely with
> different data points passed explicitly with **kwargs.
>
> I am getting an error as Deadlock!,
> 1) The maximum number of running tasks (8) for this task's DAG 'xya' has
> reached
> 2) BackfillJob is deadlocked at log.
>
> [image: image (3).png]
>
> I would like to Scale LocalExecutor as vertically only because of certain
> limitation.
>
> Can someone please throw light from experience.
>
> Thanks
>
>
>
>
>
>
>