You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/10/04 10:34:45 UTC

[PR] [WIP][SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect [spark]

itholic opened a new pull request, #43214:
URL: https://github.com/apache/spark/pull/43214

   ### What changes were proposed in this pull request?
   
   This PR proposes to enable `test_np_spark_compat_frame` and `test_np_spark_compat_series` for Spark Connect.
   
   
   ### Why are the changes needed?
   
   To increase test coverage
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, it's test-only.
   
   ### How was this patch tested?
   
   The existing tests should pass.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #43214:
URL: https://github.com/apache/spark/pull/43214#issuecomment-1746610365

   This functionality works fine in manual testing with Python interpreter:
   ```python
   >>> spark  # check if the current session is Spark Connect session.
   <pyspark.sql.connect.session.SparkSession object at 0x105b3fbe0>
   >>> import pyspark.pandas as ps
   >>> import numpy as np
   >>> psdf = ps.DataFrame({"A": [1, 2, 3]})
   >>> np_name = "arccosh"
   >>> np_func = getattr(np, np_name)
   >>> np_func(psdf)
             A
   0  0.000000
   1  1.316958
   2  1.762747
   ```
   
   But failed in UT:
   ```cmd
   spark % python/run-tests --testnames 'pyspark.pandas.tests.connect.test_parity_numpy_compat NumPyCompatParityTests.test_np_spark_compat_frame'
   
   ...
   
   ======================================================================
   FAIL [3.103s]: test_np_spark_compat_frame (pyspark.pandas.tests.connect.test_parity_numpy_compat.NumPyCompatParityTests)
   ----------------------------------------------------------------------
   ...
   pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
   ...
   AssertionError: Test in 'arccosh' function was failed.
   
   ----------------------------------------------------------------------
   Ran 1 test in 5.956s
   ```
   
   Let me test how it works on GitHub Actions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #43214:
URL: https://github.com/apache/spark/pull/43214#issuecomment-1748080577

   cc @HyukjinKwon @zhengruifeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #43214:
URL: https://github.com/apache/spark/pull/43214#issuecomment-1748212546

   merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-43656][CONNECT][PS] Enable numpy compat tests for Spark Connect [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43214:
URL: https://github.com/apache/spark/pull/43214#discussion_r1346778661


##########
python/pyspark/pandas/tests/test_numpy_compat.py:
##########
@@ -20,7 +20,6 @@
 
 from pyspark import pandas as ps
 from pyspark.pandas import set_option, reset_option
-from pyspark.pandas.numpy_compat import unary_np_spark_mappings, binary_np_spark_mappings

Review Comment:
   Some lines of `numpy_compat.py` call `pandas_udf` which [uses `is_remote()` internally](https://github.com/apache/spark/blob/master/python/pyspark/sql/pandas/functions.py#L509-L514), so we should import `numpy_compat` after initializing the Spark Connect properly when testing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #43214: [SPARK-43656][CONNECT][PS][TESTS] Enable numpy compat tests for Spark Connect
URL: https://github.com/apache/spark/pull/43214


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org