You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "kyle-winkelman (via GitHub)" <gi...@apache.org> on 2023/03/04 20:50:20 UTC

[GitHub] [airflow] kyle-winkelman commented on pull request #29790: Provider Databricks add jobs create operator.

kyle-winkelman commented on PR #29790:
URL: https://github.com/apache/airflow/pull/29790#issuecomment-1454871094

   My specific use case is that I want to run multiple tasks in the same job and on a single `job_clusters`. This is not supported by the DatabricksSubmitRunOperator because it doesn't support the parameter `job_clusters`, so the only way to support it is to create a new_cluster per task I want to run.
   
   The other approach is to use the DatabricksRunNowOperator which has the limitation that you have to define your Databricks Job somewhere else and in some different manner (i.e. manually in Databricks UI, custom CI/CD pipeline, etc.). My team doesn't like having the definition of the job be separated from the use of it from Airflow. In my opinion the DatabricksRunNowOperator with just a single `job_id` lacks information about what is actually happening.
   
   So to sum up, it is useful to be paired with the DatabricksRunNowOperator to define a job and run it in the same DAG.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org