You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/31 20:15:27 UTC

[GitHub] [spark] khalidmammadov opened a new pull request, #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

khalidmammadov opened a new pull request, #37748:
URL: https://github.com/apache/spark/pull/37748

   ### What changes were proposed in this pull request?
   PySpark atan2, hypot, pow and pmod functions marked as accepting float type as argument but produce error when called together. For example:
   
   ```
   >>> from pyspark.sql.functions import *
   >>> df = spark.range(1)
   >>> df.select(atan2(1.1, 2.1)).first()
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/home/khalid/dev/p/SparkTest/spark_venv3.10/lib/python3.10/site-packages/pyspark/sql/functions.py", line 962, in atan2
       return _invoke_binary_math_function("atan2", col1, col2)
     File "/home/khalid/dev/p/SparkTest/spark_venv3.10/lib/python3.10/site-packages/pyspark/sql/functions.py", line 111, in _invoke_binary_math_function
       return _invoke_function(
     File "/home/khalid/dev/p/SparkTest/spark_venv3.10/lib/python3.10/site-packages/pyspark/sql/functions.py", line 85, in _invoke_function
       return Column(jf(*args))
     File "/home/khalid/dev/p/SparkTest/spark_venv3.10/lib/python3.10/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
     File "/home/khalid/dev/p/SparkTest/spark_venv3.10/lib/python3.10/site-packages/pyspark/sql/utils.py", line 190, in deco
       return f(*a, **kw)
     File "/home/khalid/dev/p/SparkTest/spark_venv3.10/lib/python3.10/site-packages/pyspark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 330, in get_return_value
   py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.atan2. Trace:
   py4j.Py4JException: Method atan2([class java.lang.Double, class java.lang.Double]) does not exist
           at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
           at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
           at py4j.Gateway.invoke(Gateway.java:276)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
           at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
           at java.lang.Thread.run(Thread.java:748)
   
   ```
   
   Although, this call could be written outside the Spark and supplied as a literal it's still good to be consistent with SQL API and help user to avoid seeing error.
   
   after the fix:
   
   ```
   >>> from pyspark.sql.functions import *
   >>> df = spark.range(1)
   >>> df.select(atan2(1.1, 2.1)).first()
   Row(ATAN2(1.1, 2.1)=0.4825132950224769)
   ```
   
   
   ### Why are the changes needed?
   Improve user experience. Bug fix
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, no errors
   
   
   ### How was this patch tested?
   ```
   ./python/run-tests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37748:
URL: https://github.com/apache/spark/pull/37748#issuecomment-1233897381

   cc @zero323 fyi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a diff in pull request #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
zero323 commented on code in PR #37748:
URL: https://github.com/apache/spark/pull/37748#discussion_r960952519


##########
python/pyspark/sql/functions.py:
##########
@@ -108,13 +108,10 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: Any) -> Column:
     Invokes binary JVM math function identified by name
     and wraps the result with :class:`~pyspark.sql.Column`.
     """
-    return _invoke_function(
-        name,
-        # For legacy reasons, the arguments here can be implicitly converted into floats,
-        # if they are not columns or strings.
-        _to_java_column(col1) if isinstance(col1, (str, Column)) else float(col1),
-        _to_java_column(col2) if isinstance(col2, (str, Column)) else float(col2),
-    )
+
+    # For legacy reasons, the arguments here can be implicitly converted into column
+    cols = [_to_java_column(c if isinstance(c, (str, Column)) else lit(c)) for c in (col1, col2)]

Review Comment:
   JVM literal column can be created directly with [`_create_column_from_literal`](https://github.com/apache/spark/blob/fdc11ab0494a681444e7a7e13f3f99d25fa6cf2f/python/pyspark/sql/column.py#L47-L50).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] khalidmammadov commented on a diff in pull request #37748: [SPARK-40210][PYTHON] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on code in PR #37748:
URL: https://github.com/apache/spark/pull/37748#discussion_r961514456


##########
python/pyspark/sql/tests/test_functions.py:
##########
@@ -1029,6 +1034,11 @@ def test_np_scalar_input(self):
             res = df.select(array_position(df.data, dtype(1)).alias("c")).collect()
             self.assertEqual([Row(c=1), Row(c=0)], res)
 
+    def test_binary_math_function(self):
+        for func, res in [(atan2, 0.13664), (hypot, 8.07527), (pow, 2.14359), (pmod, 1.1)]:
+            df = self.spark.range(1).select(round(func(1.1, 8), 5).alias("a"))
+            self.assertEqual(Row(a=res), df.first())

Review Comment:
   Sure, will send followup PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a diff in pull request #37748: [SPARK-40210][PYTHON] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
zero323 commented on code in PR #37748:
URL: https://github.com/apache/spark/pull/37748#discussion_r961479775


##########
python/pyspark/sql/tests/test_functions.py:
##########
@@ -1029,6 +1034,11 @@ def test_np_scalar_input(self):
             res = df.select(array_position(df.data, dtype(1)).alias("c")).collect()
             self.assertEqual([Row(c=1), Row(c=0)], res)
 
+    def test_binary_math_function(self):
+        for func, res in [(atan2, 0.13664), (hypot, 8.07527), (pow, 2.14359), (pmod, 1.1)]:
+            df = self.spark.range(1).select(round(func(1.1, 8), 5).alias("a"))
+            self.assertEqual(Row(a=res), df.first())

Review Comment:
   Nit. We could skip multiple actions here, doing something around these lines (not tested, there might be some typos there):
   
   
   ```python
   funcs, expected = zip(*[(atan2, 0.13664), (hypot, 8.07527), (pow, 2.14359), (pmod, 1.1)])
   df = spark.range(1).select(*(func(1.1, 8) for func in funcs)
   
   for a, e in zip(df.first(), expected):
       self.assertAlmostEqual(a, e, 5)
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a diff in pull request #37748: [SPARK-40210][PYTHON] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
zero323 commented on code in PR #37748:
URL: https://github.com/apache/spark/pull/37748#discussion_r961494150


##########
python/pyspark/sql/tests/test_functions.py:
##########
@@ -1029,6 +1034,11 @@ def test_np_scalar_input(self):
             res = df.select(array_position(df.data, dtype(1)).alias("c")).collect()
             self.assertEqual([Row(c=1), Row(c=0)], res)
 
+    def test_binary_math_function(self):
+        for func, res in [(atan2, 0.13664), (hypot, 8.07527), (pow, 2.14359), (pmod, 1.1)]:
+            df = self.spark.range(1).select(round(func(1.1, 8), 5).alias("a"))
+            self.assertEqual(Row(a=res), df.first())

Review Comment:
   > oops, I did see this comment when I was merging this ...
   
   No worries @zhengruifeng  :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #37748: [SPARK-40210][PYTHON] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #37748:
URL: https://github.com/apache/spark/pull/37748#issuecomment-1235276093

   Merged into master, thanks all


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a diff in pull request #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
zero323 commented on code in PR #37748:
URL: https://github.com/apache/spark/pull/37748#discussion_r960525054


##########
python/pyspark/sql/functions.py:
##########
@@ -108,12 +108,13 @@ def _invoke_binary_math_function(name: str, col1: Any, col2: Any) -> Column:
     Invokes binary JVM math function identified by name
     and wraps the result with :class:`~pyspark.sql.Column`.
     """
+
+    def align_type(c: Any) -> Union[str, Column]:
+        # For legacy reasons, the arguments here can be implicitly converted into column
+        return c if isinstance(c, (str, Column)) else lit(c)

Review Comment:
   Unless we plan to reuse this in the future, I'd make more senses to just inline the expression.
   
   
   If we do plan to reuse it, this should be marked as internal (`align_type` -> `_align_type`) and moved next to other helper functions (also, a docstring would be a good idea).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] khalidmammadov commented on pull request #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
khalidmammadov commented on PR #37748:
URL: https://github.com/apache/spark/pull/37748#issuecomment-1233379236

   This is second solution to the issue. 1st was https://github.com/apache/spark/pull/37650


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #37748: [SPARK-40210][PYTHON] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #37748:
URL: https://github.com/apache/spark/pull/37748#discussion_r961484532


##########
python/pyspark/sql/tests/test_functions.py:
##########
@@ -1029,6 +1034,11 @@ def test_np_scalar_input(self):
             res = df.select(array_position(df.data, dtype(1)).alias("c")).collect()
             self.assertEqual([Row(c=1), Row(c=0)], res)
 
+    def test_binary_math_function(self):
+        for func, res in [(atan2, 0.13664), (hypot, 8.07527), (pow, 2.14359), (pmod, 1.1)]:
+            df = self.spark.range(1).select(round(func(1.1, 8), 5).alias("a"))
+            self.assertEqual(Row(a=res), df.first())

Review Comment:
   @khalidmammadov if you want to have a try, feel free to send a followup PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #37748: [SPARK-40210][PYTHON] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #37748:
URL: https://github.com/apache/spark/pull/37748#discussion_r961482449


##########
python/pyspark/sql/tests/test_functions.py:
##########
@@ -1029,6 +1034,11 @@ def test_np_scalar_input(self):
             res = df.select(array_position(df.data, dtype(1)).alias("c")).collect()
             self.assertEqual([Row(c=1), Row(c=0)], res)
 
+    def test_binary_math_function(self):
+        for func, res in [(atan2, 0.13664), (hypot, 8.07527), (pow, 2.14359), (pmod, 1.1)]:
+            df = self.spark.range(1).select(round(func(1.1, 8), 5).alias("a"))
+            self.assertEqual(Row(a=res), df.first())

Review Comment:
   oops, I did see this comment when I was merging this ...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #37748: [SPARK-40210][PYTHON] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #37748: [SPARK-40210][PYTHON] Fix math atan2, hypot, pow and pmod float argument call
URL: https://github.com/apache/spark/pull/37748


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #37748: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #37748:
URL: https://github.com/apache/spark/pull/37748#issuecomment-1233689711

   `pmod` was newly added in https://github.com/apache/spark/commit/f8b15395cf347b6c6c6a4a20077fdeb31bfabb24 
   
   do we need to make it support the legacy?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37748: [SPARK-40210][PYTHON] Fix math atan2, hypot, pow and pmod float argument call

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37748:
URL: https://github.com/apache/spark/pull/37748#issuecomment-1235861927

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org