You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Rolf Schroeder (JIRA)" <ji...@apache.org> on 2018/12/05 11:41:00 UTC

[jira] [Closed] (AIRFLOW-2683) priority_weights honored by LocalExecutor but not by CeleryExecutor

     [ https://issues.apache.org/jira/browse/AIRFLOW-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rolf Schroeder closed AIRFLOW-2683.
-----------------------------------
    Resolution: Invalid

> priority_weights honored by LocalExecutor but not by CeleryExecutor
> -------------------------------------------------------------------
>
>                 Key: AIRFLOW-2683
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2683
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: celery, scheduler
>    Affects Versions: 1.7.1.3
>         Environment: Linux
>            Reporter: Rolf Schroeder
>            Priority: Major
>         Attachments: airflow_priorities_celery.png, airflow_priorities_dag.png, airflow_priorities_localexecutor.png, airflow_priorities_purge_in_the_middle.png, test_priority.py
>
>
> Hi,
> I am running Airflow 1.7.1.3 with Celery 4.0.2 (and rabbitmq). I have come across the following issue: It seems to me that task priorities are not honored as expected when scheduling tasks via Celery. However, when using the LocalExecutor, the priorities seem to work fine. This issue arises when low prio tasks get scheduled/queued before high prio tasks. Celery will finish all previously scheduled/queued low prio tasks before tackling the high prio ones. In contrast, the LocalExecutor does not further process remaining low prio tasks and starts the high prio tasks. I fee like once Airflow has sent the tasks to Celery, it "looses" control of the execution order. Details below. I searched in Jira and possibly the following issues are linked (although I am not convinced)
> (AIRFLOW-1510)https://issues.apache.org/jira/browse/AIRFLOW-1510
> (AIRFLOW-584)https://issues.apache.org/jira/browse/AIRFLOW-584
> I have the following test setup: Two independent initial tasks, each with 3 downstream tasks. One of the initial tasks takes longer but has high prio downstream tasks. The other initial task has a short duration and low prio downstream tasks. Both initial tasks start at the same time. My expectation is that some of the low prio tasks get executed first (since their initial upstream task is faster) but the moment the slower initial taks is done, its high prio downstream tasks should get executed before the low prio downstream tasks.
> Here is a picture of the setup (forget about init0, this is just to make the DAG look correct):
>  !airflow_priorities_dag.png!
> I've been trying this with a LocalExecutor (AIRFLOW__CORE__PARALLELISM=2) and two celery workers. When running the LocalExecutor, the high prio tasks get indeed executed once their init task is done (as expected, the 'prio100' tasks finish before the remaining 'prio10' tasks):
> !airflow_priorities_localexecutor.png!
> However, when using Celery, it seems that that the priorities are not honored anymore. Once can see clearly that the low prio tasks (prio10) are all executed before the high prio ones (prio100)
> !airflow_priorities_celery.png!
> Here is the corresponding code
> [^test_priority.py]
> I do not know whether this expected behavior or a bug? Is there any way to "fix" this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)