You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Andrew Chen (JIRA)" <ji...@apache.org> on 2017/03/22 23:53:41 UTC

[jira] [Created] (AIRFLOW-1028) Databricks Operator for Airflow

Andrew Chen created AIRFLOW-1028:
------------------------------------

             Summary: Databricks Operator for Airflow
                 Key: AIRFLOW-1028
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1028
             Project: Apache Airflow
          Issue Type: New Feature
            Reporter: Andrew Chen
            Assignee: Andrew Chen


It would be nice to have a Databricks Operator/Hook in Airflow so users of Databricks can more easily integrate with Airflow.

The operator would submit a spark job to our new /jobs/runs/submit endpoint. This endpoint is similar to https://docs.databricks.com/api/latest/jobs.html#jobscreatejob but does not include the email_notifications, max_retries, min_retry_interval_millis, retry_on_timeout, schedule, max_concurrent_runs fields. (The submit docs are not out because it's still a private endpoint.)

Our proposed design for the operator then is to match this REST API endpoint. Each argument to the parameter is named to be one of the fields of the REST API request and the value of the argument will match the type expected by the REST API. We will also merge extra keys from kwargs which should not be passed to the BaseOperator into our API call in order to be flexible to updates.

In the case that this interface is not very user friendly, we can later add more operators which extend this operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)