You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "dtenedor (via GitHub)" <gi...@apache.org> on 2023/08/23 00:35:11 UTC

[GitHub] [spark] dtenedor commented on a diff in pull request #42617: [SPARK-44918][SQL][PYTHON] Support named arguments in scalar Python/Pandas UDFs

dtenedor commented on code in PR #42617:
URL: https://github.com/apache/spark/pull/42617#discussion_r1302322467


##########
python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py:
##########
@@ -1467,6 +1467,96 @@ def udf(x):
         finally:
             shutil.rmtree(path)
 
+    def test_named_arguments(self):
+        @pandas_udf("int")
+        def test_udf(a, b):
+            return a + 10 * b
+
+        self.spark.udf.register("test_udf", test_udf)
+
+        for i, df in enumerate(
+            [
+                self.spark.range(2).select(test_udf(a=col("id"), b=col("id") * 10)),
+                self.spark.range(2).select(test_udf(b=col("id") * 10, a=col("id"))),
+                self.spark.sql("SELECT test_udf(a => id, b => id * 10) FROM range(2)"),

Review Comment:
   can we also have a test case with a positional argument first and then a named argument after? We can have a positive version of this test case where the positional argument maps to the first argument of the UDF and the named argument refers to the second, and another negative version where the named argument uses the same name as the first (positional) argument? Same for `test_udf.py`?



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala:
##########
@@ -50,6 +50,19 @@ case class UserDefinedPythonFunction(
     udfDeterministic: Boolean) {
 
   def builder(e: Seq[Expression]): Expression = {
+    if (pythonEvalType == PythonEvalType.SQL_BATCHED_UDF
+        || pythonEvalType ==PythonEvalType.SQL_ARROW_BATCHED_UDF
+        || pythonEvalType == PythonEvalType.SQL_SCALAR_PANDAS_UDF) {
+      /*
+       * Check if the named arguments:
+       * - don't have duplicated names
+       * - don't contain positional arguments

Review Comment:
   ```suggestion
          * - don't contain positional arguments after named arguments
          * - all map to valid argument names from the function declaration
   ```



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonUDFRunner.scala:
##########
@@ -146,4 +157,30 @@ object PythonUDFRunner {
       }
     }
   }
+
+  def writeUDFs(
+    dataOut: DataOutputStream,

Review Comment:
   please indent each of these function arguments by 2 more spaces (for a total of +4 spaces each) per the style guide?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org