You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by "Bolke de Bruin (JIRA)" <ji...@apache.org> on 2018/01/18 15:11:00 UTC

[jira] [Resolved] (AIRFLOW-192) Implement priority_weight aggregation using ancestors (rather than successors)

     [ https://issues.apache.org/jira/browse/AIRFLOW-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bolke de Bruin resolved AIRFLOW-192.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.10.0

Issue resolved by pull request #2941
[https://github.com/apache/incubator-airflow/pull/2941]

> Implement priority_weight aggregation using ancestors (rather than successors)
> ------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-192
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-192
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>    Affects Versions: Airflow 1.7.1.2
>            Reporter: Sergei Iakhnin
>            Priority: Major
>             Fix For: 1.10.0
>
>
> Currently tasks are being scheduled based on the priority_weight. The effective priority of a task is it's own priority plus the priorities of all tasks that follow it in a dag. This results in undesirable scheduling behaviour in my use case.
> My use case involves running scientific workflows where a number of operations are being carried out on a set of samples in a set. Each sample is handled by a separate dag run that is manually triggered. It is common for several thousand dag instances to be in flight at a given time. The dag reserves a sample, operates on it, and then releases it. I would like for each sample to be reserved for as short a time as possible, so that other programs can have an opportunity to operate on it and dag runs can complete as fast as possible. However, because of the current priority logic, if I were to schedule several thousand dags at a given time, they would first all execute their first state, then all execute their second state, etc. Thus, no dag can complete fully, until all dags complete their second last state. This results in unnecessarily long dag run times and simultaneous completion of all dags.
> Ideally, Airflow would support the reverse of the current logic used for priorities i.e. a task's priority is the sum of priorities of all its ancestors. This way, the further along a dag is in its processing the more likely its tasks will get scheduled (thus leading to a shorter completion time, and release of its resources).
> Also, a nominal priority mode would be useful, where a task's priority is exactly the number given to it by the author, in order to allow more scheduling flexibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)