You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/05/12 09:12:48 UTC

[GitHub] [airflow] uranusjr opened a new pull request #15794: Make SparkSqlHook use Connection

uranusjr opened a new pull request #15794:
URL: https://github.com/apache/airflow/pull/15794


   Fix #8713. This takes a different approach from previous work (#12710); instead of deprecating anything, the configuration is now resolved in a two-step process:
   
   1. If an argument is explicitly passed in by the user, it is used (the current ebhaviour), and configuration on the Conenction object is ignored. A try block is added so the Connection is not needed if the user wants to explicitly pass in everything.
   2. If the user does not pass in configuration arguments explicitly, the configuration is populated by a Connection object (if available).
   
   This is straightforward for most of the arguments, but three requires special treatment: `master`, `name`, and `verbose`, since they have a default value if not provided explicitly. So if the user does not pass them, *and* a Connection entry is not available, they are set to the default value instead (`'yarn"`, `"default-name"`, and `True` respectively).
   
   There should I document this behaviour? At least we should write down what `extra` keys the user can use to configure the hook.
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#discussion_r631015404



##########
File path: docs/apache-airflow-providers-apache-spark/connections/spark.rst
##########
@@ -27,7 +27,7 @@ The Apache Spark connection type enables connection to Apache Spark.
 Default Connection IDs
 ----------------------
 
-Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators point to ``spark_sql_default`` by default, but don't use it.
+Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators point to ``spark_sql_default`` by default, but its usage is optional.

Review comment:
       Hmm yeah `SparkSubmitHook` seems to also follow this approach. I guess that’s an expected approach then!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#issuecomment-843545940


   There isn't really any prior art for this -- nothing else in Airflow has a "scheme" that is overloaded like this.
   
   🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #15794:
URL: https://github.com/apache/airflow/pull/15794


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boittega commented on pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
boittega commented on pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#issuecomment-839768763


   LGTM
   The approach here fixes the bug that would force to have the `spark_sql_default` connection created, in case the connection is not present in the list of connections it will get the parameters from the hook or from the default parameters for master and queue.
   Thank you @uranusjr 💯 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boittega commented on a change in pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
boittega commented on a change in pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#discussion_r631002008



##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -72,13 +77,33 @@ def __init__(
         executor_memory: Optional[str] = None,
         keytab: Optional[str] = None,
         principal: Optional[str] = None,
-        master: str = 'yarn',
+        master: Optional[str] = None,
         name: str = 'default-name',
         num_executors: Optional[int] = None,
         verbose: bool = True,
-        yarn_queue: str = 'default',
+        yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+
+        try:
+            conn: "Optional[Connection]" = self.get_connection(conn_id)
+        except AirflowNotFoundException:
+            conn = None
+            options = {}
+        else:
+            options = conn.extra_dejson
+
+        # Set arguments to values set in Connection if not explicitly provided.
+        if master is None:
+            if conn is None:
+                master = "yarn"
+            elif conn.port:
+                master = f"{conn.host}:{conn.port}"
+            else:
+                master = conn.host
+        if yarn_queue is None:
+            yarn_queue = options.get("queue", "default")
+
         self._sql = sql
         self._conf = conf
         self._conn = self.get_connection(conn_id)

Review comment:
       you created the variable `conn` on line 89 with the same result, but inside of the try.
   The idea is to only call the function `get_connection` once but you are calling it on line 89 and 109, line 109 will raise an error because it is not in a try.
   Does it make sense?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#issuecomment-860226882


   @uranusjr can you fix the tests?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#issuecomment-843941647


   So the scheme is an issue for both (and only) `SparkSqlHook` and `SparkSubmitHook`. Given that `SparkSubmitHook` already didn't provide a scheme field prior to this PR, and nobody ever complained, I'm inclined to split this into a separate feature request and not deal with it in this PR, because YAGNI.
   
   And *if* this is ever requested, we have two options:
   
   1. Explicitly document you can prepend the scheme in the `host` field in the connection. (This works as-is, but probably unintuitive.)
   2. Add an extra field `scheme` in connection and apply it to both `SparkSqlHook` and `SparkSubmitHook`. (With documentation, of course.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boittega commented on a change in pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
boittega commented on a change in pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#discussion_r630920717



##########
File path: docs/apache-airflow-providers-apache-spark/connections/spark.rst
##########
@@ -27,7 +27,7 @@ The Apache Spark connection type enables connection to Apache Spark.
 Default Connection IDs
 ----------------------
 
-Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators point to ``spark_sql_default`` by default, but don't use it.
+Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators point to ``spark_sql_default`` by default, but its usage is optional.

Review comment:
       change to `use spark_sql_default` instead of `point to spark_sql_default`.
   I would completely remove the `, but its usage is optional`, I think it is the normal behaviour for any connection.
   ```suggestion
   Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators use ``spark_sql_default`` by default.
   ```

##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -72,13 +77,33 @@ def __init__(
         executor_memory: Optional[str] = None,
         keytab: Optional[str] = None,
         principal: Optional[str] = None,
-        master: str = 'yarn',
+        master: Optional[str] = None,
         name: str = 'default-name',
         num_executors: Optional[int] = None,
         verbose: bool = True,
-        yarn_queue: str = 'default',
+        yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+
+        try:
+            conn: "Optional[Connection]" = self.get_connection(conn_id)
+        except AirflowNotFoundException:
+            conn = None
+            options = {}
+        else:
+            options = conn.extra_dejson
+
+        # Set arguments to values set in Connection if not explicitly provided.
+        if master is None:
+            if conn is None:
+                master = "yarn"
+            elif conn.port:
+                master = f"{conn.host}:{conn.port}"
+            else:
+                master = conn.host
+        if yarn_queue is None:
+            yarn_queue = options.get("queue", "default")
+
         self._sql = sql
         self._conf = conf
         self._conn = self.get_connection(conn_id)

Review comment:
       receive the `conn` variable here.
   ```suggestion
           self._conn = conn
   ```

##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -72,13 +77,33 @@ def __init__(
         executor_memory: Optional[str] = None,
         keytab: Optional[str] = None,
         principal: Optional[str] = None,
-        master: str = 'yarn',
+        master: Optional[str] = None,
         name: str = 'default-name',
         num_executors: Optional[int] = None,
         verbose: bool = True,
-        yarn_queue: str = 'default',
+        yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+
+        try:
+            conn: "Optional[Connection]" = self.get_connection(conn_id)
+        except AirflowNotFoundException:
+            conn = None
+            options = {}
+        else:
+            options = conn.extra_dejson
+
+        # Set arguments to values set in Connection if not explicitly provided.
+        if master is None:
+            if conn is None:
+                master = "yarn"
+            elif conn.port:
+                master = f"{conn.host}:{conn.port}"
+            else:
+                master = conn.host
+        if yarn_queue is None:
+            yarn_queue = options.get("queue", "default")

Review comment:
       On the `self._yarn_queue = yarn_queue` bellow you can add the above logic like this:
   ```
   self._yarn_queue = yarn_queue or options.get("queue", "default")
   ````




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#discussion_r630956448



##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -72,13 +77,33 @@ def __init__(
         executor_memory: Optional[str] = None,
         keytab: Optional[str] = None,
         principal: Optional[str] = None,
-        master: str = 'yarn',
+        master: Optional[str] = None,
         name: str = 'default-name',
         num_executors: Optional[int] = None,
         verbose: bool = True,
-        yarn_queue: str = 'default',
+        yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+
+        try:
+            conn: "Optional[Connection]" = self.get_connection(conn_id)
+        except AirflowNotFoundException:
+            conn = None
+            options = {}
+        else:
+            options = conn.extra_dejson
+
+        # Set arguments to values set in Connection if not explicitly provided.
+        if master is None:
+            if conn is None:
+                master = "yarn"
+            elif conn.port:
+                master = f"{conn.host}:{conn.port}"
+            else:
+                master = conn.host
+        if yarn_queue is None:
+            yarn_queue = options.get("queue", "default")

Review comment:
       Personally I prefer explcitly statements instead of `or`. I guess that’s subjective nad don’t have a strong opinion if there’s an established coding style. Is there?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#discussion_r631011721



##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -72,13 +77,33 @@ def __init__(
         executor_memory: Optional[str] = None,
         keytab: Optional[str] = None,
         principal: Optional[str] = None,
-        master: str = 'yarn',
+        master: Optional[str] = None,
         name: str = 'default-name',
         num_executors: Optional[int] = None,
         verbose: bool = True,
-        yarn_queue: str = 'default',
+        yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+
+        try:
+            conn: "Optional[Connection]" = self.get_connection(conn_id)
+        except AirflowNotFoundException:
+            conn = None
+            options = {}
+        else:
+            options = conn.extra_dejson
+
+        # Set arguments to values set in Connection if not explicitly provided.
+        if master is None:
+            if conn is None:
+                master = "yarn"
+            elif conn.port:
+                master = f"{conn.host}:{conn.port}"
+            else:
+                master = conn.host
+        if yarn_queue is None:
+            yarn_queue = options.get("queue", "default")
+
         self._sql = sql
         self._conf = conf
         self._conn = self.get_connection(conn_id)

Review comment:
       Oh, I see what you mean now. Yeah I totally missed that, thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#issuecomment-844120432


   Yeah could be too complicated, but that can be discussed when someone really intends to do that. At that time we'll have a real user to ask for feedback as well instead of designing in a void.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boittega commented on a change in pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
boittega commented on a change in pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#discussion_r630997944



##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -72,13 +77,33 @@ def __init__(
         executor_memory: Optional[str] = None,
         keytab: Optional[str] = None,
         principal: Optional[str] = None,
-        master: str = 'yarn',
+        master: Optional[str] = None,
         name: str = 'default-name',
         num_executors: Optional[int] = None,
         verbose: bool = True,
-        yarn_queue: str = 'default',
+        yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+
+        try:
+            conn: "Optional[Connection]" = self.get_connection(conn_id)
+        except AirflowNotFoundException:
+            conn = None
+            options = {}
+        else:
+            options = conn.extra_dejson
+
+        # Set arguments to values set in Connection if not explicitly provided.
+        if master is None:
+            if conn is None:
+                master = "yarn"
+            elif conn.port:
+                master = f"{conn.host}:{conn.port}"
+            else:
+                master = conn.host
+        if yarn_queue is None:
+            yarn_queue = options.get("queue", "default")

Review comment:
       I also don't know if there is any coding style. My opinion is fewer lines and fewer variable changes.
   I am happy if you want to keep it like that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#discussion_r630953999



##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -72,13 +77,33 @@ def __init__(
         executor_memory: Optional[str] = None,
         keytab: Optional[str] = None,
         principal: Optional[str] = None,
-        master: str = 'yarn',
+        master: Optional[str] = None,
         name: str = 'default-name',
         num_executors: Optional[int] = None,
         verbose: bool = True,
-        yarn_queue: str = 'default',
+        yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+
+        try:
+            conn: "Optional[Connection]" = self.get_connection(conn_id)
+        except AirflowNotFoundException:
+            conn = None
+            options = {}
+        else:
+            options = conn.extra_dejson
+
+        # Set arguments to values set in Connection if not explicitly provided.
+        if master is None:
+            if conn is None:
+                master = "yarn"
+            elif conn.port:
+                master = f"{conn.host}:{conn.port}"
+            else:
+                master = conn.host
+        if yarn_queue is None:
+            yarn_queue = options.get("queue", "default")
+
         self._sql = sql
         self._conf = conf
         self._conn = self.get_connection(conn_id)

Review comment:
       Why? I think most hooks accept an ID instead of the connection itself.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#discussion_r631016849



##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -72,13 +77,33 @@ def __init__(
         executor_memory: Optional[str] = None,
         keytab: Optional[str] = None,
         principal: Optional[str] = None,
-        master: str = 'yarn',
+        master: Optional[str] = None,
         name: str = 'default-name',
         num_executors: Optional[int] = None,
         verbose: bool = True,
-        yarn_queue: str = 'default',
+        yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+
+        try:
+            conn: "Optional[Connection]" = self.get_connection(conn_id)
+        except AirflowNotFoundException:
+            conn = None
+            options = {}
+        else:
+            options = conn.extra_dejson
+
+        # Set arguments to values set in Connection if not explicitly provided.
+        if master is None:
+            if conn is None:
+                master = "yarn"
+            elif conn.port:
+                master = f"{conn.host}:{conn.port}"
+            else:
+                master = conn.host
+        if yarn_queue is None:
+            yarn_queue = options.get("queue", "default")
+
         self._sql = sql
         self._conf = conf
         self._conn = self.get_connection(conn_id)

Review comment:
       Actually since `self._conn` is not actually used anywhere, I’ll just remove that line altogether.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#issuecomment-844102535


   Fair I guess.
   
   What about 3. Support `spark+<actual_scheme>://` -- i.e. ` spark+k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>`? Might be overkill.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#issuecomment-844110689


   The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest master or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boittega commented on a change in pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
boittega commented on a change in pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#discussion_r631003270



##########
File path: docs/apache-airflow-providers-apache-spark/connections/spark.rst
##########
@@ -27,7 +27,7 @@ The Apache Spark connection type enables connection to Apache Spark.
 Default Connection IDs
 ----------------------
 
-Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators point to ``spark_sql_default`` by default, but don't use it.
+Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators point to ``spark_sql_default`` by default, but its usage is optional.

Review comment:
       On both Spark connectors, you have the option to send all the parameters and not using the connection as far as I know. Correct me if I am wrong.

##########
File path: docs/apache-airflow-providers-apache-spark/connections/spark.rst
##########
@@ -27,7 +27,7 @@ The Apache Spark connection type enables connection to Apache Spark.
 Default Connection IDs
 ----------------------
 
-Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators point to ``spark_sql_default`` by default, but don't use it.
+Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators point to ``spark_sql_default`` by default, but its usage is optional.

Review comment:
       On both Spark hooks, you have the option to send all the parameters and not using the connection as far as I know. Correct me if I am wrong.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#issuecomment-843058221


   You sort of can with host `spark://master` because the strings are simply joined, but that’s sort of a consequence of the implementation rather than intentionally designed. Is there prior art on this kind of things?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15794: Make SparkSqlHook use Connection

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15794:
URL: https://github.com/apache/airflow/pull/15794#discussion_r630955451



##########
File path: docs/apache-airflow-providers-apache-spark/connections/spark.rst
##########
@@ -27,7 +27,7 @@ The Apache Spark connection type enables connection to Apache Spark.
 Default Connection IDs
 ----------------------
 
-Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators point to ``spark_sql_default`` by default, but don't use it.
+Spark Submit and Spark JDBC hooks and operators use ``spark_default`` by default, Spark SQL hooks and operators point to ``spark_sql_default`` by default, but its usage is optional.

Review comment:
       > I would completely remove the `, but its usage is optional`, I think it is the normal behaviour for any connection.
   
   Is it? IIRC the Connection is *not* optional for most hooks; they fail with `AirflowNotFoundException` if a matching Connection is not found.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org