You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "ueshin (via GitHub)" <gi...@apache.org> on 2023/02/11 00:41:49 UTC

[GitHub] [spark] ueshin opened a new pull request, #39971: [SPARK-42402][CONNECT] Support parameterized SQL by `sql()`

ueshin opened a new pull request, #39971:
URL: https://github.com/apache/spark/pull/39971

   ### What changes were proposed in this pull request?
   
   Supports parameterized SQL by `sql()`.
   
   Note: `SparkSession.sql` in PySpark also supports string formatter, but it will be handled separately.
   
   ### Why are the changes needed?
   
   Currently `SparkSession.sql` in Spark Connect doesn't support parameterized SQL.
   
   ### Does this PR introduce _any_ user-facing change?
   
   The parameterized SQL will be available.
   
   For example:
   
   ```py
   >>> spark.sql("SELECT * FROM range(10) WHERE id > :minId", args = {"minId" : "7"}).toPandas()
      id
   0   8
   1   9
   ```
   
   ### How was this patch tested?
   
   Added a test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a diff in pull request #39971: [SPARK-42402][CONNECT] Support parameterized SQL by `sql()`

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #39971:
URL: https://github.com/apache/spark/pull/39971#discussion_r1103446098


##########
python/pyspark/sql/session.py:
##########
@@ -1308,7 +1308,7 @@ def prepare(obj: Any) -> Any:
         df._schema = struct
         return df
 
-    def sql(self, sqlQuery: str, args: Dict[str, str] = {}, **kwargs: Any) -> DataFrame:
+    def sql(self, sqlQuery: str, args: Optional[Dict[str, str]] = None, **kwargs: Any) -> DataFrame:

Review Comment:
   Also fix the default parameter to `None` instead of `{}` to avoid unexpected behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #39971: [SPARK-42402][CONNECT] Support parameterized SQL by `sql()`

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #39971: [SPARK-42402][CONNECT] Support parameterized SQL by `sql()`
URL: https://github.com/apache/spark/pull/39971


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a diff in pull request #39971: [SPARK-42402][CONNECT] Support parameterized SQL by `sql()`

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #39971:
URL: https://github.com/apache/spark/pull/39971#discussion_r1103446098


##########
python/pyspark/sql/session.py:
##########
@@ -1308,7 +1308,7 @@ def prepare(obj: Any) -> Any:
         df._schema = struct
         return df
 
-    def sql(self, sqlQuery: str, args: Dict[str, str] = {}, **kwargs: Any) -> DataFrame:
+    def sql(self, sqlQuery: str, args: Optional[Dict[str, str]] = None, **kwargs: Any) -> DataFrame:

Review Comment:
   Also fix the default parameter in PySpark to `None` instead of `{}` to avoid unexpected behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #39971: [SPARK-42402][CONNECT] Support parameterized SQL by `sql()`

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #39971:
URL: https://github.com/apache/spark/pull/39971#issuecomment-1426582132

   Could you adjust the following test case?
   
   https://github.com/apache/spark/blob/99431e28f950bb25c421abd51888a3f9f4b46685/python/pyspark/sql/tests/connect/test_connect_plan.py#L647-L655


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39971: [SPARK-42402][CONNECT] Support parameterized SQL by `sql()`

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39971:
URL: https://github.com/apache/spark/pull/39971#issuecomment-1426903137

   Mered to master and branch-3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org