You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Jarek Potiuk (Jira)" <ji...@apache.org> on 2020/01/19 23:23:00 UTC

[jira] [Updated] (AIRFLOW-3797) Improve performance of cc1e65623dc7_add_max_tries_column_to_task_instance migration

     [ https://issues.apache.org/jira/browse/AIRFLOW-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jarek Potiuk updated AIRFLOW-3797:
----------------------------------
    Labels: gsoc gsoc2020 mentor  (was: )

> Improve performance of cc1e65623dc7_add_max_tries_column_to_task_instance migration
> -----------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3797
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3797
>             Project: Apache Airflow
>          Issue Type: Improvement
>            Reporter: Bas Harenslak
>            Priority: Major
>              Labels: gsoc, gsoc2020, mentor
>
> The cc1e65623dc7_add_max_tries_column_to_task_instance migration creates a DagBag for the corresponding DAG for every single task instance. This is very redundant and not necessary.
> Hence, there are discussions on Slack like these:
> {noformat}
> murquizo   [Jan 17th at 1:33 AM]
> Why does the airflow upgradedb command loop through all of the dags?
> ....
> murquizo   [14 days ago]
> NICE, @BasPH! that is exactly the migration that I was referring to.  We have about 600k task instances and have a several python files that generate multiple DAGs, so looping through all of the task_instances to update max_tries was too slow.  It took 3 hours and didnt even complete! i pulled the plug and manually executed the migration.   Thanks for your response.
> {noformat}
> An easy to accomplish improvement is to parse a DAG only once and after that set the task instance try_number. I created a branch for it (https://github.com/BasPH/incubator-airflow/tree/bash-optimise-db-upgrade), currently running tests and will make PR when done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)