You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2023/06/19 00:32:02 UTC
[spark] branch master updated: [SPARK-43009][PYTHON][FOLLOWUP] Parameterized `sql_formatter.sql()` with Any constants

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 53dae3d0440 [SPARK-43009][PYTHON][FOLLOWUP] Parameterized `sql_formatter.sql()` with Any constants
53dae3d0440 is described below

commit 53dae3d0440f5acad1fd30b17fe27ed208860960
Author: Max Gekk <ma...@gmail.com>
AuthorDate: Mon Jun 19 09:31:50 2023 +0900

    [SPARK-43009][PYTHON][FOLLOWUP] Parameterized `sql_formatter.sql()` with Any constants
    
    ### What changes were proposed in this pull request?
    In the PR, I propose to change API of parameterized SQL, and replace type of argument values from `string` to `Any` in `sql_formatter`. Language API can accept `Any` objects from which it is possible to construct literal expressions.
    
    ### Why are the changes needed?
    To align the API to PySpark's `sql()`.
    
    And the current implementation the parameterized `sql()` requires arguments as string values parsed to SQL literal expressions that causes the following issues:
    1. SQL comments are skipped while parsing, so, some fragments of input might be skipped. For example, `'Europe -- Amsterdam'`. In this case, `-- Amsterdam` is excluded from the input.
    2. Special chars in string values must be escaped, for instance `'E\'Twaun Moore'`
    
    ### Does this PR introduce _any_ user-facing change?
    Yes.
    
    ### How was this patch tested?
    By running the affected test suite:
    ```
    $ python/run-tests --parallelism=1 --testnames 'pyspark.pandas.sql_formatter'
    ```
    
    Closes #41644 from MaxGekk/fix-pandas-sql_formatter.
    
    Authored-by: Max Gekk <ma...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 python/pyspark/pandas/sql_formatter.py | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/python/pyspark/pandas/sql_formatter.py b/python/pyspark/pandas/sql_formatter.py
index f87dd3ff29f..4387a1e0909 100644
--- a/python/pyspark/pandas/sql_formatter.py
+++ b/python/pyspark/pandas/sql_formatter.py
@@ -43,7 +43,7 @@ _CAPTURE_SCOPES = 3
 def sql(
     query: str,
     index_col: Optional[Union[str, List[str]]] = None,
-    args: Dict[str, str] = {},
+    args: Optional[Dict[str, Any]] = None,
     **kwargs: Any,
 ) -> DataFrame:
     """
@@ -103,10 +103,14 @@ def sql(
 
             Also note that the index name(s) should be matched to the existing name.
     args : dict
-        A dictionary of parameter names to string values that are parsed as SQL literal
-        expressions. For example, dict keys: "rank", "name", "birthdate"; dict values:
-        "1", "'Steven'", "DATE'2023-03-21'". The fragments of string values belonged to SQL
-        comments are skipped while parsing.
+        A dictionary of parameter names to Python objects that can be converted to
+        SQL literal expressions. See
+        <a href="https://spark.apache.org/docs/latest/sql-ref-datatypes.html">
+        Supported Data Types</a> for supported value types in Python.
+        For example, dictionary keys: "rank", "name", "birthdate";
+        dictionary values: 1, "Steven", datetime.date(2023, 4, 2).
+        Dict value can be also a `Column` of literal expression, in that case it is taken as is.
+
 
         .. versionadded:: 3.4.0
 
@@ -166,7 +170,7 @@ def sql(
 
     And substitude named parameters with the `:` prefix by SQL literals.
 
-    >>> ps.sql("SELECT * FROM range(10) WHERE id > :bound1", args={"bound1":"7"})
+    >>> ps.sql("SELECT * FROM range(10) WHERE id > :bound1", args={"bound1":7})
        id
     0   8
     1   9


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org