You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Ash Berlin-Taylor (JIRA)" <ji...@apache.org> on 2019/07/04 13:59:00 UTC

[jira] [Resolved] (AIRFLOW-4478) Operators instantiate many duplicate objects

     [ https://issues.apache.org/jira/browse/AIRFLOW-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ash Berlin-Taylor resolved AIRFLOW-4478.
----------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.10.4

> Operators instantiate many duplicate objects
> --------------------------------------------
>
>                 Key: AIRFLOW-4478
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4478
>             Project: Apache Airflow
>          Issue Type: Improvement
>            Reporter: Josh Carp
>            Assignee: Huihua Zhang
>            Priority: Trivial
>             Fix For: 1.10.4
>
>
> `BaseOperator` creates a `Resources` instance, which in turn creates four `Resource` instances. Class creation in python isn't free; creating `Resources` and its child classes takes ~5μs out of a total of ~20μs to instantiate a `BaseOperator` on my system. This time adds up when creating tens of thousands of operators, especially in environments like GCP Cloud Composer that are very sensitive to DAG parse time.
> Assuming that most users don't actually configure task resources, since they're only respected by the non-default `CgroupTaskRunner`, we can save time by creating a single `Resources` instance and sharing it across tasks that don't set `resources`. We could do even better by allowing users to pass a `Resources` instance to `BaseOperator` rather than passing a `dict` that's used to instantiate `Resources`, but that would be a breaking change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)