You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/18 10:00:10 UTC

[GitHub] [spark] MaxGekk opened a new pull request, #38712: [WIP][SQL] Parameterized SQL queries

MaxGekk opened a new pull request, #38712:
URL: https://github.com/apache/spark/pull/38712

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   By running new tests:
   ```
   $ build/sbt "test:testOnly *PlanParserSuite"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #38712: [WIP][SPARK-41271][SQL] Parameterized SQL queries

Posted by GitBox <gi...@apache.org>.
xkrogen commented on PR #38712:
URL: https://github.com/apache/spark/pull/38712#issuecomment-1329917869

   Can you provide more information on how this proposal compares to the behavior of other popular SQL dialects (e.g. ANSI, Postgres, MySQL, Trino, HiveQL, Snowflake)? One thing that stands out to me is that this PR only supports named parameters; I think positional parameters are more common? For example AFAIK JDBC only supports positional parameters ([ref](https://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html)).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #38712: [WIP][SPARK-41271][SQL] Parameterized SQL queries

Posted by GitBox <gi...@apache.org>.
xkrogen commented on PR #38712:
URL: https://github.com/apache/spark/pull/38712#issuecomment-1334227405

   > Actually, this is just Oracle's proprietary implementation.
   
   I don't think this is true, but admittedly my knowledge of the SQL standardization process is limited, so please correct me if you find gaps in my reasoning. In the [SQL-92 standard](https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt) (this is the only reference to the full standard I can find with a few minutes of searching; please let me know if you have a better source), `Dynamic SQL` is defined in section 17, and on page 98 it explicitly defines a parameter name as being prefixed by a colon:
   ```
            <parameter name> ::= <colon> <identifier>
   ```
   
   > The `@` marker has been chosen to make migrating from BigQuery to Spark SQL easier.
   
   This seems, to me, like a pretty shaky argument? I could just as easily say that we should choose `:` to make it easier to migrate from Redshift, or, what seems to be a strong argument based on what I found in my last comment, that we should choose positional argument support (`?`) to make it easier to migrate from Trino, PrestoDB, PosgreSQL, Snowflake, _and_ MySQL. Why are we giving BigQuery precedence here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on pull request #38712: [WIP][SPARK-41271][SQL] Parameterized SQL queries

Posted by GitBox <gi...@apache.org>.
xkrogen commented on PR #38712:
URL: https://github.com/apache/spark/pull/38712#issuecomment-1332434274

   All 3 of the examples you provided use different syntax (`:id` vs `@id` vs `{{ id }}`). Can we provide more motivation for why we are picking the syntax used in this implementation? [SQL-92 defined ANSI Dynamic SQL](https://docs.oracle.com/cd/A81042_01/DOC/appdev.816/a76942/pc_14ady.htm) which uses colons (`:id`). One other example using `:` is [SQLAlchemy](https://docs.sqlalchemy.org/en/14/core/tutorial.html#using-textual-sql). 
   
   Other examples (besides JDBC) only supporting positional arguments:
   * [Trino](https://trino.io/docs/current/sql/execute.html) & [PrestoDB](https://prestodb.io/docs/current/sql/execute.html)
   * [PostgreSQL](https://www.postgresql.org/docs/current/ecpg-dynamic.html)
   * [Snowflake](https://docs.snowflake.com/en/sql-reference/sql/execute-immediate.html)
   * [MySQL][(https://dev.mysql.com/doc/refman/8.0/en/user-variables.html](https://dev.mysql.com/doc/refman/8.0/en/sql-prepared-statements.html)
   
   So from what I am seeing, positional arguments (with `?` placeholders) are the most common, and when named parameters are supported, colons are the most common way to denote them (including ANSI support). I feel there is a good argument to be made that named arguments are much easier to work with than positional so I am supportive of using named args rather than positional, but I feel we should at least follow the ANSI standard and use colons rather than at-signs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #38712: [WIP][SPARK-41271][SQL] Parameterized SQL queries

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #38712:
URL: https://github.com/apache/spark/pull/38712#issuecomment-1333867601

   > Can we provide more motivation for why we are picking the syntax used in this implementation?
   
   @xkrogen The `@` marker has been chosen to make migrating from BigQuery to Spark SQL easier.
   
   > [SQL-92 defined ANSI Dynamic SQL](https://docs.oracle.com/cd/A81042_01/DOC/appdev.816/a76942/pc_14ady.htm) which uses colons (:id).
   
   Actually, this is just Oracle's proprietary implementation.
   
   > I feel we should at least follow the ANSI standard and use colons rather than at-signs.
   
   I agree that we should follow standards whenever it is possible but there are no standards for this feature so far.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #38712: [WIP][SPARK-41271][SQL] Parameterized SQL queries

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #38712:
URL: https://github.com/apache/spark/pull/38712#issuecomment-1330110664

   > Can you provide more information ... of other popular SQL dialects
   
   @xkrogen Named parameters are supported by:
   1. Redshift: https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html#data-api-calling
   2. BigQuery: https://cloud.google.com/bigquery/docs/parameterized-queries#api
   3. MS DBSQL: https://learn.microsoft.com/en-us/azure/databricks/sql/user/queries/query-parameters


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #38712: [WIP][SPARK-41271][SQL] Parameterized SQL queries

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #38712: [WIP][SPARK-41271][SQL] Parameterized SQL queries
URL: https://github.com/apache/spark/pull/38712


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org