You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Fokko Driesprong (JIRA)" <ji...@apache.org> on 2018/05/13 18:55:00 UTC
[jira] [Resolved] (AIRFLOW-2430) Bad query patterns at scale
prevent scheduler from starting
[ https://issues.apache.org/jira/browse/AIRFLOW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fokko Driesprong resolved AIRFLOW-2430.
---------------------------------------
Resolution: Fixed
Fix Version/s: 2.0.0
1.10.0
Issue resolved by pull request #3324
[https://github.com/apache/incubator-airflow/pull/3324]
> Bad query patterns at scale prevent scheduler from starting
> -----------------------------------------------------------
>
> Key: AIRFLOW-2430
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2430
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Reporter: Gabriel Silk
> Priority: Major
> Fix For: 1.10.0, 2.0.0
>
>
> h2. Summary
> Certain queries executed by the scheduler do not scale well with the number of tasks being operated on. Two example functions
> * reset_state_for_orphaned_tasks
> * _execute_task_instances
>
> Concretely — with a mere 75k tasks being operated on, the first query can take dozens of minutes to run, blocking the scheduler from making progress.
>
> The cause is twofold:
> 1. As the query grows past a certain point, the MySQL planner will choose to do a full table scan as opposed to using an index. I assume the same is true of Postgres.
> 2. The query predicate size grows linearly in the number of tasks being operated, thus increasing the amount of work that needs to be done per row.
>
> In a sense, you’re left with an operation that scales O(n^2)
>
> h2. Proposed Fix
> It appears that one of these bad query patterns was fixed in [3547cbffd|https://github.com/apache/incubator-airflow/commit/3547cbffdbffac2f98a8aa05526e8c9671221025] by introducing a configurable batch size with can be set via max_tis_per_query.
>
> I propose we extend the suggested fix to include other poorly-performing queries in the scheduler.
>
> I’ve identified two queries that are directly affecting my work and included them in the diff, though the same approach can be extended to more queries as we see fit.
>
> Thanks!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)