You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Shreyas Joshi (JIRA)" <ji...@apache.org> on 2016/10/25 19:47:58 UTC

[jira] [Created] (AIRFLOW-596) Networkx based scheduler

Shreyas Joshi created AIRFLOW-596:
-------------------------------------

             Summary: Networkx based scheduler
                 Key: AIRFLOW-596
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-596
             Project: Apache Airflow
          Issue Type: Improvement
          Components: scheduler
            Reporter: Shreyas Joshi
            Priority: Minor


I'd like to use [networkx|https://networkx.github.io/] and represent each dag/dagbag in memory as a networkx graph.

Benefits:
* The scheduling logic would be simplified a fair bit from what it is now.
* There seems to be gaps between scheduling tasks in a DAG. This is detrimental in cases where each task doesn't take much time and the scheduling delay dominates. We might be able to reduce these gaps. Also, I see currently that the scheduler sleeps for a second by default. Not sure why this is necessary.
* Set the stage for smarter scheduling of tasks in the future. (As a simple example, greedily schedule the longest tasks first)

The netowrkx graph can be created when the dag is being scheduled. As the scheduler runs, it updates the status failed/success etc. before asking for the next task to run. We leverage networkx to figure out which tasks are eligible for execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)