You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "MaxGekk (via GitHub)" <gi...@apache.org> on 2023/09/20 10:04:20 UTC

[GitHub] [spark] MaxGekk opened a new pull request, #43014: [WIP][CONNECT][PYTHON] Support map and array parameters by `sql()`

MaxGekk opened a new pull request, #43014:
URL: https://github.com/apache/spark/pull/43014

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

### Was this patch authored or co-authored using generative AI tooling?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] allisonwang-db commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.

allisonwang-db commented on code in PR #43014:
URL: https://github.com/apache/spark/pull/43014#discussion_r1331920397


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -1237,13 +1237,23 @@ def test_sql(self):
         self.assertEqual(1, len(pdf.index))
 
     def test_sql_with_named_args(self):
-        df = self.connect.sql("SELECT * FROM range(10) WHERE id > :minId", args={"minId": 7})
-        df2 = self.spark.sql("SELECT * FROM range(10) WHERE id > :minId", args={"minId": 7})
+        from pyspark.sql.functions import create_map, lit
+        from pyspark.sql.connect.functions import lit as clit
+        from pyspark.sql.connect.functions import create_map as ccreate_map

Review Comment:
   @zhengruifeng just wondering should `SF` and `CF` be capitalized here? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on code in PR #43014:
URL: https://github.com/apache/spark/pull/43014#discussion_r1332291118


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -1237,13 +1237,23 @@ def test_sql(self):
         self.assertEqual(1, len(pdf.index))
 
     def test_sql_with_named_args(self):
-        df = self.connect.sql("SELECT * FROM range(10) WHERE id > :minId", args={"minId": 7})
-        df2 = self.spark.sql("SELECT * FROM range(10) WHERE id > :minId", args={"minId": 7})
+        from pyspark.sql.functions import create_map, lit
+        from pyspark.sql.connect.functions import lit as clit
+        from pyspark.sql.connect.functions import create_map as ccreate_map

Review Comment:
   @allisonwang-db since they are just used in tests, so I guess doesn't matter.
   
   for new test file, I think we can start with `sf` and `cf`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.

zhengruifeng commented on code in PR #43014:
URL: https://github.com/apache/spark/pull/43014#discussion_r1331636427


##########
python/pyspark/sql/connect/plan.py:
##########
@@ -1049,21 +1049,23 @@ def __init__(self, query: str, args: Optional[Union[Dict[str, Any], List]] = Non
         self._query = query
         self._args = args
 
+    def __to_expr(self, session: "SparkConnectClient", v: Any) -> proto.Expression:

Review Comment:
   I think we'd better rename it `_to_expr`.
   
   [the python standard](https://peps.python.org/pep-0008/#method-names-and-instance-variables) say:
   
   
   > Note: there is some controversy about the use of __names
   
   
   > In addition, the following special forms using leading or trailing underscores are recognized (these can generally be combined with any case convention):
   
   > `_single_leading_underscore`: weak “internal use” indicator. E.g. from M import * does not import objects whose names start with an underscore.
   > `single_trailing_underscore_`: used by convention to avoid conflicts with Python keyword, e.g.
   tkinter.Toplevel(master, class_='ClassName')
   > `__double_leading_underscore`: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).
   > `__double_leading_and_trailing_underscore__`: “magic” objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.
   



##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -1237,13 +1237,23 @@ def test_sql(self):
         self.assertEqual(1, len(pdf.index))
 
     def test_sql_with_named_args(self):
-        df = self.connect.sql("SELECT * FROM range(10) WHERE id > :minId", args={"minId": 7})
-        df2 = self.spark.sql("SELECT * FROM range(10) WHERE id > :minId", args={"minId": 7})
+        from pyspark.sql.functions import create_map, lit
+        from pyspark.sql.connect.functions import lit as clit
+        from pyspark.sql.connect.functions import create_map as ccreate_map

Review Comment:
   https://github.com/apache/spark/blob/8c27de68756d4b0e5940211340a0b323d808aead/python/pyspark/sql/tests/connect/test_connect_basic.py#L78-L79
   
   I think you can use already imported `SF` and `CF`, to be consistent with other tests



##########
python/pyspark/sql/connect/plan.py:
##########
@@ -1049,21 +1049,23 @@ def __init__(self, query: str, args: Optional[Union[Dict[str, Any], List]] = Non
         self._query = query
         self._args = args
 
+    def __to_expr(self, session: "SparkConnectClient", v: Any) -> proto.Expression:

Review Comment:
   I think we'd better rename it `_to_expr`.
   
   [the python standard](https://peps.python.org/pep-0008/#method-names-and-instance-variables) say:
   
   
   > Note: there is some controversy about the use of __names
   
   
   > In addition, the following special forms using leading or trailing underscores are recognized (these can generally be combined with any case convention):
   
   > `_single_leading_underscore`: weak “internal use” indicator. E.g. from M import * does not import objects whose names start with an underscore.
   > `single_trailing_underscore_`: used by convention to avoid conflicts with Python keyword, e.g.
   tkinter.Toplevel(master, class_='ClassName')
   > `__double_leading_underscore`: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).
   > `__double_leading_and_trailing_underscore__`: “magic” objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk commented on PR #43014:
URL: https://github.com/apache/spark/pull/43014#issuecomment-1728909015

   Merging to master. Thank you, @HyukjinKwon @zhengruifeng and @allisonwang-db for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk closed pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.

MaxGekk closed pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`
URL: https://github.com/apache/spark/pull/43014


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org