You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Máté Szabó (JIRA)" <ji...@apache.org> on 2018/03/05 14:54:00 UTC

[jira] [Updated] (AIRFLOW-2128) 'Tall' DAGs scale worse than 'wide' DAGs

     [ https://issues.apache.org/jira/browse/AIRFLOW-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Máté Szabó updated AIRFLOW-2128:
--------------------------------
    Description: 
Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... -> 998 -> 999
 Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; ... 0 -> 999

Take a super simple case where both graphs are of 1000 tasks, and all the tasks are just "sleep 0.03" bash commands (see the attached files).
 With the default SequentialExecutor (without paralellism), I would expect my 2 example DAGs to take (approximately) the same time to run, but apparently this is not the case.

For the wide DAG it was about 80 successfully executed tasks in 10 minutes, for the tall one it was 0.

This anomaly also seem to affect the web UI. Opening up the graph view or the tree view for the wide DAG takes about 6 seconds on my machine, but for the tall one it takes significantly longer, in fact currently it does not load at all.

  was:
Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... -> 998 -> 999
Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; ... 0 -> 999


Take a super simple case where both graphs are of 1000 tasks, and all the tasks are just "sleep 0.03" bash commands (see the attached files).
With the default SequentialExecutor (without paralellism), I would expect my 2 example DAGs to take (approximately) the same time to run, but apprently this is not the case.

For the wide DAG it was about 80 successfully executed tasks in 10 minutes, for the tall one it was 0.

This anomaly also seem to affect the web UI. Opening up the graph view or the tree view for the wide DAG takes about 6 seconds on my machine, but for the tall one it takes significantly longer, in fact currently it does not load at all.


> 'Tall' DAGs scale worse than 'wide' DAGs
> ----------------------------------------
>
>                 Key: AIRFLOW-2128
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2128
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, DagRun, scheduler
>    Affects Versions: 1.9.0
>            Reporter: Máté Szabó
>            Priority: Major
>              Labels: performance, usability
>         Attachments: tall_dag.py, wide_dag.py
>
>
> Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... -> 998 -> 999
>  Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; ... 0 -> 999
> Take a super simple case where both graphs are of 1000 tasks, and all the tasks are just "sleep 0.03" bash commands (see the attached files).
>  With the default SequentialExecutor (without paralellism), I would expect my 2 example DAGs to take (approximately) the same time to run, but apparently this is not the case.
> For the wide DAG it was about 80 successfully executed tasks in 10 minutes, for the tall one it was 0.
> This anomaly also seem to affect the web UI. Opening up the graph view or the tree view for the wide DAG takes about 6 seconds on my machine, but for the tall one it takes significantly longer, in fact currently it does not load at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)