You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by "Andrew Chen (JIRA)" <ji...@apache.org> on 2017/04/03 00:41:41 UTC

[jira] [Work started] (AIRFLOW-1028) Databricks Operator for Airflow

     [ https://issues.apache.org/jira/browse/AIRFLOW-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on AIRFLOW-1028 started by Andrew Chen.
--------------------------------------------
> Databricks Operator for Airflow
> -------------------------------
>
>                 Key: AIRFLOW-1028
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1028
>             Project: Apache Airflow
>          Issue Type: New Feature
>            Reporter: Andrew Chen
>            Assignee: Andrew Chen
>
> It would be nice to have a Databricks Operator/Hook in Airflow so users of Databricks can more easily integrate with Airflow.
> The operator would submit a spark job to our new /jobs/runs/submit endpoint. This endpoint is similar to https://docs.databricks.com/api/latest/jobs.html#jobscreatejob but does not include the email_notifications, max_retries, min_retry_interval_millis, retry_on_timeout, schedule, max_concurrent_runs fields. (The submit docs are not out because it's still a private endpoint.)
> Our proposed design for the operator then is to match this REST API endpoint. Each argument to the parameter is named to be one of the fields of the REST API request and the value of the argument will match the type expected by the REST API. We will also merge extra keys from kwargs which should not be passed to the BaseOperator into our API call in order to be flexible to updates.
> In the case that this interface is not very user friendly, we can later add more operators which extend this operator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)