You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "nivangio (via GitHub)" <gi...@apache.org> on 2023/03/01 13:54:56 UTC

[GitHub] [airflow] nivangio opened a new pull request, #29840: DatabricksSubmitRunOperator to support taskflow

nivangio opened a new pull request, #29840:
URL: https://github.com/apache/airflow/pull/29840

   - Moving `normalise_json_content` from `init` to `execute` to avoid erroring due unexpected type
   - Chaning type hint of all task arguments to `Dict[str, object]` to support XCom, dicts, str, etc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "josh-fell (via GitHub)" <gi...@apache.org>.
josh-fell commented on PR #29840:
URL: https://github.com/apache/airflow/pull/29840#issuecomment-1458329798

   @nivangio Congrats on your first contribution! 🎉 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nivangio commented on a diff in pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "nivangio (via GitHub)" <gi...@apache.org>.
nivangio commented on code in PR #29840:
URL: https://github.com/apache/airflow/pull/29840#discussion_r1123399094


##########
airflow/providers/databricks/operators/databricks.py:
##########
@@ -285,12 +285,12 @@ def __init__(
         *,
         json: Any | None = None,
         tasks: list[object] | None = None,
-        spark_jar_task: dict[str, str] | None = None,
-        notebook_task: dict[str, str] | None = None,
-        spark_python_task: dict[str, str | list[str]] | None = None,
-        spark_submit_task: dict[str, list[str]] | None = None,
-        pipeline_task: dict[str, str] | None = None,
-        dbt_task: dict[str, str | list[str]] | None = None,

Review Comment:
   Ok, this might be unrelated to the specific issue we are mentioning here but passing a `dict[str, dict[str, str]]` is also accepted and produces expected outcome under `notebook_task` parameter, for example.
   
   However, when running mypy checks, it fails. I agree that maybe passing as `object` might be vague and some more thought will be needed but, at the very least, the value for `notebook_task` dict param (and probably for the rest too) should accept `dict`, `str`, `int` to go in line with what's there so far (i.e., without this PR) and, eventually `XComArg` and `PlainXComArg` to support the changes in this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on PR #29840:
URL: https://github.com/apache/airflow/pull/29840#issuecomment-1450188985

   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (ruff, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "josh-fell (via GitHub)" <gi...@apache.org>.
josh-fell commented on PR #29840:
URL: https://github.com/apache/airflow/pull/29840#issuecomment-1456479254

   @nivangio Can you take a look at the failing tests please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nivangio commented on pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "nivangio (via GitHub)" <gi...@apache.org>.
nivangio commented on PR #29840:
URL: https://github.com/apache/airflow/pull/29840#issuecomment-1458462628

   @josh-fell Thanks a lot for all the support 😄 . Happy to keep on contributing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nivangio commented on a diff in pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "nivangio (via GitHub)" <gi...@apache.org>.
nivangio commented on code in PR #29840:
URL: https://github.com/apache/airflow/pull/29840#discussion_r1124592896


##########
airflow/providers/databricks/operators/databricks.py:
##########
@@ -285,12 +285,12 @@ def __init__(
         *,
         json: Any | None = None,
         tasks: list[object] | None = None,
-        spark_jar_task: dict[str, str] | None = None,
-        notebook_task: dict[str, str] | None = None,
-        spark_python_task: dict[str, str | list[str]] | None = None,
-        spark_submit_task: dict[str, list[str]] | None = None,
-        pipeline_task: dict[str, str] | None = None,
-        dbt_task: dict[str, str | list[str]] | None = None,

Review Comment:
   Awesome! Let's do that then. I am gonna revert those changes and then submit a new PR for the other one once this is merged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell merged pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "josh-fell (via GitHub)" <gi...@apache.org>.
josh-fell merged PR #29840:
URL: https://github.com/apache/airflow/pull/29840


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nivangio commented on a diff in pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "nivangio (via GitHub)" <gi...@apache.org>.
nivangio commented on code in PR #29840:
URL: https://github.com/apache/airflow/pull/29840#discussion_r1123399094


##########
airflow/providers/databricks/operators/databricks.py:
##########
@@ -285,12 +285,12 @@ def __init__(
         *,
         json: Any | None = None,
         tasks: list[object] | None = None,
-        spark_jar_task: dict[str, str] | None = None,
-        notebook_task: dict[str, str] | None = None,
-        spark_python_task: dict[str, str | list[str]] | None = None,
-        spark_submit_task: dict[str, list[str]] | None = None,
-        pipeline_task: dict[str, str] | None = None,
-        dbt_task: dict[str, str | list[str]] | None = None,

Review Comment:
   Ok, this might be unrelated to the specific issue we are mentioning here but passing a `dict[str, dict[str, str]]` is also accepted and produces expected outcome under `notebook_task` parameter, for example.
   
   However, when running mypy checks, it fails. I agree that `object` might be vague and some more thought will be needed but, at the very least, the value for `notebook_task` dict param (and probably for the rest too) should accept `dict`, `str`, `int` to go in line with what's there so far (i.e., without this PR) and, eventually `XComArg` and `PlainXComArg` to support the changes in this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on PR #29840:
URL: https://github.com/apache/airflow/pull/29840#issuecomment-1458327995

   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on a diff in pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "josh-fell (via GitHub)" <gi...@apache.org>.
josh-fell commented on code in PR #29840:
URL: https://github.com/apache/airflow/pull/29840#discussion_r1122539049


##########
airflow/providers/databricks/operators/databricks.py:
##########
@@ -285,12 +285,12 @@ def __init__(
         *,
         json: Any | None = None,
         tasks: list[object] | None = None,
-        spark_jar_task: dict[str, str] | None = None,
-        notebook_task: dict[str, str] | None = None,
-        spark_python_task: dict[str, str | list[str]] | None = None,
-        spark_submit_task: dict[str, list[str]] | None = None,
-        pipeline_task: dict[str, str] | None = None,
-        dbt_task: dict[str, str | list[str]] | None = None,

Review Comment:
   The current typing is sufficient IMO. These args would eventually be of these types either from XComs, templates, etc.;`dict[str, object]` is nebulous.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on code in PR #29840:
URL: https://github.com/apache/airflow/pull/29840#discussion_r1124046647


##########
airflow/providers/databricks/operators/databricks.py:
##########
@@ -285,12 +285,12 @@ def __init__(
         *,
         json: Any | None = None,
         tasks: list[object] | None = None,
-        spark_jar_task: dict[str, str] | None = None,
-        notebook_task: dict[str, str] | None = None,
-        spark_python_task: dict[str, str | list[str]] | None = None,
-        spark_submit_task: dict[str, list[str]] | None = None,
-        pipeline_task: dict[str, str] | None = None,
-        dbt_task: dict[str, str | list[str]] | None = None,

Review Comment:
   I may also be a good idea to split the changes into two PRs, one does not need to block the other.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nivangio commented on a diff in pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "nivangio (via GitHub)" <gi...@apache.org>.
nivangio commented on code in PR #29840:
URL: https://github.com/apache/airflow/pull/29840#discussion_r1123399094


##########
airflow/providers/databricks/operators/databricks.py:
##########
@@ -285,12 +285,12 @@ def __init__(
         *,
         json: Any | None = None,
         tasks: list[object] | None = None,
-        spark_jar_task: dict[str, str] | None = None,
-        notebook_task: dict[str, str] | None = None,
-        spark_python_task: dict[str, str | list[str]] | None = None,
-        spark_submit_task: dict[str, list[str]] | None = None,
-        pipeline_task: dict[str, str] | None = None,
-        dbt_task: dict[str, str | list[str]] | None = None,

Review Comment:
   Ok, this might be unrelated to the specific issue we are mentioning here but passing a `dict[str, dict[str, str]]` is also accepted and produces expected outcome under `notebook_task` parameter, for example.
   
   However, when running mypy checks, it fails. I agree that `object` might be vague and some more thought will be needed but, at the very least, the value for `notebook_task` dict param (and probably for the rest too) should accept `dict`, `str`, `int` to go in line with what's there so far (i.e., without this PR) and, eventually `XComArg` and `PlainXComArg` to support the changes in this PR
   
   I will try to think of a more precise type definition but I'd like to know if you agree on the fact that the typing as it is now is insufficient



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on code in PR #29840:
URL: https://github.com/apache/airflow/pull/29840#discussion_r1122650972


##########
airflow/providers/databricks/operators/databricks.py:
##########
@@ -285,12 +285,12 @@ def __init__(
         *,
         json: Any | None = None,
         tasks: list[object] | None = None,
-        spark_jar_task: dict[str, str] | None = None,
-        notebook_task: dict[str, str] | None = None,
-        spark_python_task: dict[str, str | list[str]] | None = None,
-        spark_submit_task: dict[str, list[str]] | None = None,
-        pipeline_task: dict[str, str] | None = None,
-        dbt_task: dict[str, str | list[str]] | None = None,

Review Comment:
   Agreed, please revert these lines.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nivangio commented on pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "nivangio (via GitHub)" <gi...@apache.org>.
nivangio commented on PR #29840:
URL: https://github.com/apache/airflow/pull/29840#issuecomment-1456497566

   @josh-fell sure!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] nivangio commented on a diff in pull request #29840: DatabricksSubmitRunOperator to support taskflow

Posted by "nivangio (via GitHub)" <gi...@apache.org>.
nivangio commented on code in PR #29840:
URL: https://github.com/apache/airflow/pull/29840#discussion_r1123399094


##########
airflow/providers/databricks/operators/databricks.py:
##########
@@ -285,12 +285,12 @@ def __init__(
         *,
         json: Any | None = None,
         tasks: list[object] | None = None,
-        spark_jar_task: dict[str, str] | None = None,
-        notebook_task: dict[str, str] | None = None,
-        spark_python_task: dict[str, str | list[str]] | None = None,
-        spark_submit_task: dict[str, list[str]] | None = None,
-        pipeline_task: dict[str, str] | None = None,
-        dbt_task: dict[str, str | list[str]] | None = None,

Review Comment:
   Ok, this might be unrelated to the specific issue we are mentioning here but passing a `dict[str, dict[str, str]]` is also accepted and produces expected outcome under `notebook_task` parameter, for example.
   
   However, when running mypy checks, it fails. I agree that maybe passing as `object` might be vague and some more thought will be needed but, at the very least, the key for `notebook_task` param (and probably for the rest too) should accept `dict`, `str`, `int` to go in line with what's there so far (i.e., without this PR) and, eventually `XComArg` and `PlainXComArg` to support the changes in this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org