You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Victor Jimenez (JIRA)" <ji...@apache.org> on 2018/08/27 09:43:00 UTC

[jira] [Updated] (AIRFLOW-2964) Lazy generation of the job description

     [ https://issues.apache.org/jira/browse/AIRFLOW-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victor Jimenez updated AIRFLOW-2964:
------------------------------------
    Description: 
When instantiating a DatabricksSubmitRunOperator users need to pass the description of the job that will later be executed on Databricks.

The job description is only needed at execution time (when the hook is called). However, the `json` parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently).

It would be good to have an equivalent mechanism to the `python_callable` parameter in the `PythonOperator`. In this way, users could pass a function that would generate the job description only when the operator is actually executed.

  was:
When instantiating a `DatabricksSubmitRunOperator` users need to pass the description of the job that will later be executed on Databricks.

The job description is only needed at execution time (when the hook is called). However, the `json` parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently).

It would be good to have an equivalent mechanism to the `python_callable` parameter in the `PythonOperator`. In this way, users could pass a function that would generate the job description only when the operator is actually executed.


> Lazy generation of the job description
> --------------------------------------
>
>                 Key: AIRFLOW-2964
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2964
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: contrib, operators
>    Affects Versions: Airflow 1.9.0, 1.10
>            Reporter: Victor Jimenez
>            Priority: Major
>
> When instantiating a DatabricksSubmitRunOperator users need to pass the description of the job that will later be executed on Databricks.
> The job description is only needed at execution time (when the hook is called). However, the `json` parameter must already have the full job description when constructing the operator. This may present a problem if computing the job description needs to execute expensive operations (e.g., querying a database). The expensive operation will be invoked every single time the DAG is reprocessed (which may happen quite frequently).
> It would be good to have an equivalent mechanism to the `python_callable` parameter in the `PythonOperator`. In this way, users could pass a function that would generate the job description only when the operator is actually executed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)