You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/28 19:39:51 UTC

[GitHub] [airflow] eladkal commented on a diff in pull request #24599: updated documentation for databricks operator

eladkal commented on code in PR #24599:
URL: https://github.com/apache/airflow/pull/24599#discussion_r908874570


##########
airflow/providers/databricks/operators/databricks.py:
##########
@@ -150,7 +150,7 @@ class DatabricksSubmitRunOperator(BaseOperator):
     <https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsRunsSubmit>`_
     API endpoint.
 
-    There are two ways to instantiate this operator.
+    There are three ways to instantiate this operator.

Review Comment:
   This operator has 186 lines of text in docstring. That is a lot! I feel we should extract all of this information to howto guide of the operator and keep minimal description on the class code.
   WDYT @josh-fell @Bowrna ?



##########
airflow/providers/databricks/operators/databricks.py:
##########
@@ -206,6 +206,37 @@ class DatabricksSubmitRunOperator(BaseOperator):
         For more information on how to use this operator, take a look at the guide:
         :ref:`howto/operator:DatabricksSubmitRunOperator`
 
+    The last way is to use the param tasks to pass array of objects to instantiate this operator
+        tasks =[
+            {
+              "new_cluster": {
+                  "spark_version": "2.1.0-db3-scala2.11",
+                  "num_workers": 2
+            },
+              "notebook_task": {
+                  "notebook_path": "/Users/airflow@example.com/PrepareData"}}]
+
+        notebook_run = DatabricksSubmitRunOperator(
+            task_id='notebook_run',
+            tasks = tasks)
+
+    :param tasks: Array of Objects(RunSubmitTaskSettings) <= 100 items.
+        The supported params in the array are
+        - ``task_key``
+        - ``depends_on``
+        - ``existing_cluster_id``
+        - ``new_cluster``
+        - ``notebook_task``
+        - ``spark_jar_task``
+        - ``spark_python_task``
+        - ``spark_submit_task``
+        - ``pipeline_task``
+        - ``python_wheel_task``
+        - ``libraries``
+        - ``timeout_seconds``

Review Comment:
   No need to list the options the link to the API is enough.
   We don't want to maintain this list everytime there is a new option in the API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org