You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/09/02 18:07:04 UTC

[jira] [Commented] (AIRFLOW-2430) Bad query patterns at scale prevent scheduler from starting

    [ https://issues.apache.org/jira/browse/AIRFLOW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601589#comment-16601589 ] 

Apache Spark commented on AIRFLOW-2430:
---------------------------------------

User 'gsilk' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3324

> Bad query patterns at scale prevent scheduler from starting
> -----------------------------------------------------------
>
>                 Key: AIRFLOW-2430
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2430
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>            Reporter: Gabriel Silk
>            Priority: Major
>             Fix For: 1.10.0, 2.0.0
>
>
> h2. Summary
> Certain queries executed by the scheduler do not scale well with the number of tasks being operated on. Two example functions 
>  * reset_state_for_orphaned_tasks
>  * _execute_task_instances
>  
> Concretely — with a mere 75k tasks being operated on, the first query can take dozens of minutes to run, blocking the scheduler from making progress.
>  
> The cause is twofold:
> 1. As the query grows past a certain point, the MySQL planner will choose to do a full table scan as opposed to using an index. I assume the same is true of Postgres.
> 2. The query predicate size grows linearly in the number of tasks being operated, thus increasing the amount of work that needs to be done per row.
>  
> In a sense, you’re left with an operation that scales O(n^2)
>  
> h2. Proposed Fix
> It appears that one of these bad query patterns was fixed in [3547cbffd|https://github.com/apache/incubator-airflow/commit/3547cbffdbffac2f98a8aa05526e8c9671221025] by introducing a configurable batch size with can be set via max_tis_per_query.
>  
> I propose we extend the suggested fix to include other poorly-performing queries in the scheduler.
>  
> I’ve identified two queries that are directly affecting my work and included them in the diff, though the same approach can be extended to more queries as we see fit.
>  
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)