You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/03 06:36:02 UTC

[GitHub] [airflow] ginevragaudioso opened a new issue #15171: scheduler does not apply ordering when querying which task instances to schedule

ginevragaudioso opened a new issue #15171:
URL: https://github.com/apache/airflow/issues/15171


   Issue type:
   Bug
   
   Airflow version:
   2.0.1 (although bug may have existed earlier, and master still has the bug)
   
   Issue:
   The scheduler sometimes schedules tasks in alphabetical order instead of in priority weight and execution date order. This causes priorities to not work at all, and causes some tasks with name later in the alphabet to never run as long as new tasks with names earlier in the alphabet are ready.
   
   Where the issue is in code (I think):
   The scheduler will query the DB to get a set of task instances that are ready to run: https://github.com/apache/airflow/blob/2.0.1/airflow/jobs/scheduler_job.py#L915-L924
   And will simply get the first `max_tis` task instances from the result (with the `limit` call in the last line of the query), where `max_tis` is computed earlier in the code as cumulative pools slots available. The code in master improved the query to filter out tasks from starved pools, but still it will get the first `max_tis` tasks only with no ordering or reasoning on which `max_tis` to take.
   Later, the scheduler is smart and will schedule tasks based on priority and execution order:
   https://github.com/apache/airflow/blob/2.0.1/airflow/jobs/scheduler_job.py#L978-L980
   However, the correct sorting (second code link here) will only happen on the subset picked by the query (first code link here), but the query will not pick tasks following correct sorting.
   This causes tasks with lower priority and / or later execution date to be scheduled BEFORE tasks with higher priority and / or earlier execution date, just because the first are higher in alphabet than the second, and therefore the first are returned by the unsorted limited SQL query only.
   
   Proposed fix:
   Add a "sort by" in the query that gets the tasks to examine (first code link here), so that tasks are sorted by priority weight and execution time (meaning, same logic as the list sorting done later).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb closed issue #15171: scheduler does not apply ordering when querying which task instances to queue

Posted by GitBox <gi...@apache.org>.
ashb closed issue #15171:
URL: https://github.com/apache/airflow/issues/15171


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org