You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Bas Harenslak (JIRA)" <ji...@apache.org> on 2019/01/31 22:25:00 UTC
[jira] [Created] (AIRFLOW-3797) Improve performance of
cc1e65623dc7_add_max_tries_column_to_task_instance migration
Bas Harenslak created AIRFLOW-3797:
--------------------------------------
Summary: Improve performance of cc1e65623dc7_add_max_tries_column_to_task_instance migration
Key: AIRFLOW-3797
URL: https://issues.apache.org/jira/browse/AIRFLOW-3797
Project: Apache Airflow
Issue Type: Improvement
Reporter: Bas Harenslak
The cc1e65623dc7_add_max_tries_column_to_task_instance migration creates a DagBag for the corresponding DAG for every single task instance. This is very redundant and not necessary.
Hence, there are discussions on Slack like these:
{noformat}
murquizo [Jan 17th at 1:33 AM]
Why does the airflow upgradedb command loop through all of the dags?
....
murquizo [14 days ago]
NICE, @BasPH! that is exactly the migration that I was referring to. We have about 600k task instances and have a several python files that generate multiple DAGs, so looping through all of the task_instances to update max_tries was too slow. It took 3 hours and didnt even complete! i pulled the plug and manually executed the migration. Thanks for your response.
{noformat}
An easy to accomplish improvement is to parse a DAG only once and after that set the task instance try_number. I created a branch for it (https://github.com/BasPH/incubator-airflow/tree/bash-optimise-db-upgrade), currently running tests and will make PR when done.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)