You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "MaxGekk (via GitHub)" <gi...@apache.org> on 2023/03/21 12:47:56 UTC

[GitHub] [spark] MaxGekk opened a new pull request, #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

MaxGekk opened a new pull request, #40508:
URL: https://github.com/apache/spark/pull/40508

   ### What changes were proposed in this pull request?
   In the PR, I propose to clarify the comment of `args` in parameterized `sql()`.
   
   ### Why are the changes needed?
   To make the comment more clear and highlight that input strings are parsed (not evaluated), and considered as SQL literal expressions.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   By checking coding style:
   ```
   $ ./dev/lint-python
   $ ./dev/scalastyle
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1143569959


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,9 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions.

Review Comment:
   > should not be treated as fixed values like in prepared statements.
   
   I didn't get why it should be treated in this way. Could elaborate this, please.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1144348965


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,9 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions.

Review Comment:
   > For the user of the API, it's not clear how the input values are (parsed|interpreted|processed).
   
   I think the SQL standard defines the syntax of literal for each data type, which doesn't require internal knowledge.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1143607484


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,9 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions.

Review Comment:
   > This means you're interpreting the input, am I mistaken?
   
   For the "interpreting" word means execute the SQL text and produce a date, but we just try to convert the input text to a literal. Roughly speaking, switching by branches to parse the input. Don't think that it can be considered as interpretation (or evaluation, execution and so on).
   
   > The core part is that users of the API will see that it supports parameter substitution and will believe it provides the same guarantees of fixed literals ...
   
   It provides, we checks that all args are literals but arbitrary expressions, see https://github.com/apache/spark/blob/f8966e7eee1d7f2db7b376d557d5ff6658c80653/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala#L92



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1143607484


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,9 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions.

Review Comment:
   > This means you're interpreting the input, am I mistaken?
   
   For me, the "interpreting" word means execute the SQL text and produce a date, but we just try to convert the input text to a literal. Roughly speaking, switching by branches to parse the input. Don't think that it can be considered as interpretation (or evaluation, execution and so on).
   
   > The core part is that users of the API will see that it supports parameter substitution and will believe it provides the same guarantees of fixed literals ...
   
   It provides, we checks that all args are literals but arbitrary expressions, see https://github.com/apache/spark/blob/f8966e7eee1d7f2db7b376d557d5ff6658c80653/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala#L92



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1143538305


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,9 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions.

Review Comment:
   ```suggestion
      *   A map of parameter names to string values that are parsed as fragments of SQL text and interpreted as literal expressions. The parameter bindings do not provide any
      safety guarantees on how the SQL fragment is interpreted and should not be treated
      as fixed values like in prepared statements.
   ```
   
   I think we need to be much much clearer on what the actual behavior is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1143566089


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,9 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions.

Review Comment:
   > that are parsed as fragments of SQL text and interpreted as literal expressions.
   
   We used `parseExpression()` to parse the input string values to single expression. That the comment describes directly. We don't interpret anything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1143607484


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,9 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions.

Review Comment:
   > This means you're interpreting the input, am I mistaken?
   
   For me, the "interpreting" word means to execute the SQL text and produce a date, but we just try to convert the input text to a literal. Roughly speaking, switching by branches to parse the input. Don't think that it can be considered as interpretation (or evaluation, execution and so on).
   
   > The core part is that users of the API will see that it supports parameter substitution and will believe it provides the same guarantees of fixed literals ...
   
   It provides, we checks that all args are literals but arbitrary expressions, see https://github.com/apache/spark/blob/f8966e7eee1d7f2db7b376d557d5ff6658c80653/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/parameters.scala#L92



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #40508: [SPARK-42924][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #40508: [SPARK-42924][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args
URL: https://github.com/apache/spark/pull/40508


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #40508:
URL: https://github.com/apache/spark/pull/40508#issuecomment-1477781116

   @cloud-fan @grundprinzip @HyukjinKwon Could you review this PR, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1143612707


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,9 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions.

Review Comment:
   What I say remains true, you have to have significant internal knowledge of how the implementation works to understand what happens. For the user of the API, it's not clear how the input values are (parsed|interpreted|processed).
   
   This is a security concern and should be treated as such.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #40508: [SPARK-42924][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #40508:
URL: https://github.com/apache/spark/pull/40508#issuecomment-1484535748

   Merging to master/3.4. Thank you, @cloud-fan @srielau @grundprinzip for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srielau commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "srielau (via GitHub)" <gi...@apache.org>.
srielau commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1143689845


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,8 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions. For
+   *   example, map keys: "abc", "def_1", map values: "1", "'abc'", "DATE'2023-03-21'".

Review Comment:
   How about keys that clarify the type, at least implicitly. Also arity didn't match up
   ```suggestion
      *   example, map keys: "rank", "name", "birthdate"; map values: "1", "'Steven'", "DATE'2023-03-21'".
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1143569959


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,9 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions.

Review Comment:
   > should not be treated as fixed values like in prepared statements.
   
   I didn't get why it should be treated in this way. IMHO, this statement is useless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #40508: [MINOR][SQL][CONNECT][PYTHON] Clarify the comment of parameterized SQL args

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #40508:
URL: https://github.com/apache/spark/pull/40508#discussion_r1143577084


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -213,7 +213,9 @@ class SparkSession private[sql] (
    * @param sqlText
    *   A SQL statement with named parameters to execute.
    * @param args
-   *   A map of parameter names to literal values.
+   *   A map of parameter names to string values that are parsed as SQL literal expressions.

Review Comment:
   ```
   DATE'2023-03-21'
   ```
   Essentially says parse this literal as a date and the output expression will be a date. This means you're interpreting the input, am I mistaken?
   
   The core part is that users of the API will see that it supports parameter substitution and will believe it provides the same guarantees of fixed literals as for example prepared statements which is not the case. You're not able to perform full SQL injection using this API but it's enough that you can't blindly mix trusted and untrusted input.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org