You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Gabriel Silk (JIRA)" <ji...@apache.org> on 2018/05/07 20:35:00 UTC

[jira] [Created] (AIRFLOW-2430) Bad query patterns at scale prevent scheduler from starting

Gabriel Silk created AIRFLOW-2430:
-------------------------------------

             Summary: Bad query patterns at scale prevent scheduler from starting
                 Key: AIRFLOW-2430
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2430
             Project: Apache Airflow
          Issue Type: Bug
          Components: scheduler
            Reporter: Gabriel Silk


h2. Summary

Certain queries executed by the scheduler do not scale well with the number of tasks being operated on. Two example functions 
 * reset_state_for_orphaned_tasks
 * _execute_task_instances

 

Concretely — with a mere 75k tasks being operated on, the first query can take dozens of minutes to run, blocking the scheduler from making progress.

 

The cause is twofold:

1. As the query grows past a certain point, the MySQL planner will choose to do a full table scan as opposed to using an index. I assume the same is true of Postgres.

2. The query predicate size grows linearly in the number of tasks being operated, thus increasing the amount of work that needs to be done per row.

 

In a sense, you’re left with an operation that scales O(n^2)

 
h2. Proposed Fix

It appears that one of these bad query patterns was fixed in [3547cbffd|https://github.com/apache/incubator-airflow/commit/3547cbffdbffac2f98a8aa05526e8c9671221025] by introducing a configurable batch size with can be set via max_tis_per_query.

 

I propose we extend the suggested fix to include other poorly-performing queries in the scheduler.

 

I’ve identified two queries that are directly affecting my work and included them in the diff, though the same approach can be extended to more queries as we see fit.

 

Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)