You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "panbingkun (via GitHub)" <gi...@apache.org> on 2024/01/18 03:15:02 UTC

[PR] [SPARK-46753][PYTHON][DOCS] Fix pypy3 python test [spark]

panbingkun opened a new pull request, #44778:
URL: https://github.com/apache/spark/pull/44778

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #44778:
URL: https://github.com/apache/spark/pull/44778#issuecomment-1898227750

   For a record:
   1.At present, `the pyspark test` based on `pypy3` has turned `green`, and the corresponding GA running workflow is:
   https://github.com/panbingkun/spark/runs/20605281376
   
   2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464191868


##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -69,6 +69,9 @@
 )
 
 
+@unittest.skipIf(
+    not have_pandas or not have_pyarrow, pandas_requirement_message or pyarrow_requirement_message
+)
 class BaseUDTFTestsMixin:

Review Comment:
   Could you check https://github.com/apache/spark/pull/44778#discussion_r1464191332? Seems like assertDataFrameEqual does not depend on pandas



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464198326


##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -69,6 +69,9 @@
 )
 
 
+@unittest.skipIf(
+    not have_pandas or not have_pyarrow, pandas_requirement_message or pyarrow_requirement_message
+)
 class BaseUDTFTestsMixin:

Review Comment:
   > nvm, we just merged the PR to fix it.
   
   Okay, let me rebase it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #44778:
URL: https://github.com/apache/spark/pull/44778#issuecomment-1920236218

   https://github.com/apache/spark/actions/runs/7728043635
   <img width="1014" alt="image" src="https://github.com/apache/spark/assets/15246973/52b47ae0-6d34-4475-a2fd-5928e6dcdd0d">
   
   Finally it turned green. 😄


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1467352837


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -45,7 +45,14 @@
     IntegerType,
     BooleanType,
 )
-from pyspark.testing.sqlutils import have_pandas
+from pyspark.testing.sqlutils import (
+    have_pandas,
+    have_pyarrow,
+    pyarrow_requirement_message,
+)
+
+
+unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)

Review Comment:
   I think you should annotate the class below.
   
   ```python
   @unittest.skipIf(...)
   class UtilsTestsMixin
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][DOCS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1456853394


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   hm, I don't think this uses Arrow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1462876376


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   Yeah, after adding `# doctest: +SKIP`, GA has passed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464201501


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Sure, let me test it out now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464266537


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Just realized that we're checking `pandas` and `pyarrow` when importing `pyspark.pandas` as below, and exit the program when it's not installed:
   
   ```python
   # pyspark/pandas/__init__.py
   try:
       require_minimum_pandas_version()
       require_minimum_pyarrow_version()
   except ImportError as e:
       if os.environ.get("SPARK_TESTING"):
           warnings.warn(str(e))
           sys.exit()
       else:
           raise
   ```
   
   @HyukjinKwon Maybe should we add some flag or something here to enable running test without `pandas` and `pyarrow`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1468066698


##########
python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py:
##########
@@ -1320,6 +1322,7 @@ def f4_iter(it):
             self.assertEqual(expected_multi, df_multi_1.collect())
             self.assertEqual(expected_multi, df_multi_2.collect())
 
+    @unittest.skipIf(not have_grpcio, grpcio_requirement_message)

Review Comment:
   Thank you @panbingkun !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1469111861


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -1039,13 +1045,15 @@ def test_udf(a):
         with self.assertRaisesRegex(PythonException, "StopIteration"):
             self.spark.range(10).select(test_udf(col("id"))).show()
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_python_udf_segfault(self):
         with self.sql_conf({"spark.sql.execution.pyspark.udf.faulthandler.enabled": True}):
             with self.assertRaisesRegex(Exception, "Segmentation fault"):
                 import ctypes
 
                 self.spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()

Review Comment:
   Why does this need Arrow?



##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -1039,13 +1045,15 @@ def test_udf(a):
         with self.assertRaisesRegex(PythonException, "StopIteration"):
             self.spark.range(10).select(test_udf(col("id"))).show()
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_python_udf_segfault(self):
         with self.sql_conf({"spark.sql.execution.pyspark.udf.faulthandler.enabled": True}):
             with self.assertRaisesRegex(Exception, "Segmentation fault"):
                 import ctypes
 
                 self.spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()

Review Comment:
   Mind sharing test failure message



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on PR #44778:
URL: https://github.com/apache/spark/pull/44778#issuecomment-1917709106

   Looks great, thank you @panbingkun!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1458191267


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   Okay, Let me give it a try.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1458167119


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   cc @itholic 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464356706


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   After this is merged, I will rebase and test it again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1467358772


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -935,7 +940,8 @@ def test_udf(a, b):
             ]
         ):
             with self.subTest(query_no=i):
-                assertDataFrameEqual(df, [Row(0), Row(101)])
+                if have_pyarrow:
+                    assertDataFrameEqual(df, [Row(0), Row(101)])

Review Comment:
   because `assertDataFrameEqual` try importing pyspark.pandas which requires `arrow` and `pandas`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #44778:
URL: https://github.com/apache/spark/pull/44778#issuecomment-1913876962

   @HyukjinKwon @itholic Based on the above modifications, it has turned green. 
   I will try opening some skipped doctests again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1463846457


##########
python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py:
##########
@@ -1320,6 +1322,7 @@ def f4_iter(it):
             self.assertEqual(expected_multi, df_multi_1.collect())
             self.assertEqual(expected_multi, df_multi_2.collect())
 
+    @unittest.skipIf(not have_grpcio, grpcio_requirement_message)

Review Comment:
   Sounds good I'll adjust that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464191332


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Hmm... it seems like actually `assertDataFrameEqual` is not depend on pandas??
   
   ```python
   >>> import pandas
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   ModuleNotFoundError: No module named 'pandas'
   >>> from pyspark.testing.utils import assertDataFrameEqual
   >>> df1 = spark.range(10)
   >>> df2 = spark.range(10)
   >>> assertDataFrameEqual(df1, df2)
   ```
   
   It works well without pandas. In the code, we skip using pandas if it's not installed:
   
   ```python
       has_pandas = False
       try:
           # If pandas dependencies are available, allow pandas or pandas-on-Spark DataFrame
           import pyspark.pandas as ps
           import pandas as pd
           from pyspark.testing.pandasutils import PandasOnSparkTestUtils
   
           has_pandas = True
       except ImportError:
           # no pandas, so we won't call pandasutils functions
           pass
   ```
   
   > After testing, we found that the assertDataFrameEqual  method used in this UT requires it.
   
   Did you test with uninstalling `pandas` clearly? we should uninstall `pandas-stubs` as well before uninstalling `pandas`. See https://github.com/apache/spark/pull/44745#issuecomment-1894850179 more detail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][DOCS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1456908005


##########
python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py:
##########
@@ -1320,6 +1322,7 @@ def f4_iter(it):
             self.assertEqual(expected_multi, df_multi_1.collect())
             self.assertEqual(expected_multi, df_multi_2.collect())
 
+    @unittest.skipIf(not have_grpcio, grpcio_requirement_message)

Review Comment:
   @xinrong-meng and @zhengruifeng this is from https://github.com/apache/spark/commit/7c7b9585a2aba7bbd52c197b07ed0181ae049c75. Can we separate the connect test into separate connect directory? Ideally this place should not require connect dependencies



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][DOCS] Fix pypy3 python test [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1456945440


##########
python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py:
##########
@@ -1320,6 +1322,7 @@ def f4_iter(it):
             self.assertEqual(expected_multi, df_multi_1.collect())
             self.assertEqual(expected_multi, df_multi_2.collect())
 
+    @unittest.skipIf(not have_grpcio, grpcio_requirement_message)

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1467378305


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -935,7 +940,8 @@ def test_udf(a, b):
             ]
         ):
             with self.subTest(query_no=i):
-                assertDataFrameEqual(df, [Row(0), Row(101)])
+                if have_pyarrow:
+                    assertDataFrameEqual(df, [Row(0), Row(101)])

Review Comment:
   On second thought, we can just remove `pyspark.pandas` dependency from `assertDataFrameEqual`.
   
   Just submitted PR: https://github.com/apache/spark/pull/44899.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1467534527


##########
python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py:
##########
@@ -1320,6 +1322,7 @@ def f4_iter(it):
             self.assertEqual(expected_multi, df_multi_1.collect())
             self.assertEqual(expected_multi, df_multi_2.collect())
 
+    @unittest.skipIf(not have_grpcio, grpcio_requirement_message)

Review Comment:
   According to the test results, regarding the error of not installing `grpcio`, we have fixed it. 
   Thanks, @xinrong-meng.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1467349424


##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -247,7 +247,8 @@ class TestUDTF:
             def eval(self, a: int):
                 yield
 
-        assertDataFrameEqual(TestUDTF(lit(1)), [Row(a=None)])
+        if have_pyarrow:
+            assertDataFrameEqual(TestUDTF(lit(1)), [Row(a=None)])

Review Comment:
   @allisonwang-db why do we need Arrow here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1468417360


##########
python/pyspark/testing/utils.py:
##########
@@ -768,6 +772,8 @@ def assertDataFrameEqual(
         pass
 
     if has_pandas:
+        import pyspark.pandas as ps
+

Review Comment:
   Just to verify in advance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1462891675


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   By the way, I remember that after I added `# doctest: +SKIP`, I tested it `locally` once.
   Let it run GA again. 😄



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1462879102


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   Wait a moment, I forgot to remove it. I'll test it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470108568


##########
python/pyspark/sql/tests/test_udf_profiler.py:
##########
@@ -51,6 +51,7 @@ def add2(x):
     action(df)
 
 
+@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)

Review Comment:
   Thanks Hyukjin.
   @panbingkun Would you please skip the test "UDFProfilerTests.test_unsupported" if no Arrow installed, instead of skipping the whole class?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1469112402


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -45,9 +45,14 @@
     IntegerType,
     BooleanType,
 )
-from pyspark.testing.sqlutils import have_pandas
+from pyspark.testing.sqlutils import (
+    have_pandas,
+    have_pyarrow,
+    pyarrow_requirement_message,
+)
 
 
+@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)

Review Comment:
   Can we remove this now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1457351394


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   After testing, we found that the `assertDataFrameEqual ` method used in this UT requires it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][DOCS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1456852954


##########
python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py:
##########
@@ -1320,6 +1322,7 @@ def f4_iter(it):
             self.assertEqual(expected_multi, df_multi_1.collect())
             self.assertEqual(expected_multi, df_multi_2.collect())
 
+    @unittest.skipIf(not have_grpcio, grpcio_requirement_message)

Review Comment:
   @itholic why conncet test is here? We should place it under connect



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][DOCS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1456852954


##########
python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py:
##########
@@ -1320,6 +1322,7 @@ def f4_iter(it):
             self.assertEqual(expected_multi, df_multi_1.collect())
             self.assertEqual(expected_multi, df_multi_2.collect())
 
+    @unittest.skipIf(not have_grpcio, grpcio_requirement_message)

Review Comment:
   @itholic why conncet test is here? We should place it under connect



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464136461


##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -69,6 +69,9 @@
 )
 
 
+@unittest.skipIf(
+    not have_pandas or not have_pyarrow, pandas_requirement_message or pyarrow_requirement_message
+)
 class BaseUDTFTestsMixin:

Review Comment:
   BaseUDTFTestsMixin shouldn't require pandas dependency. Is this because of the assertDataFrameEqual usage?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1462876376


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   Yeah, after adding `# doctest: +SKIP`, GA has passed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464266537


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Just realized that we're checking `pandas` and `pyarrow` when importing `pyspark.pandas` as below, and exit the program when it's not installed:
   
   ```python
   try:
       require_minimum_pandas_version()
       require_minimum_pyarrow_version()
   except ImportError as e:
       if os.environ.get("SPARK_TESTING"):
           warnings.warn(str(e))
           sys.exit()
       else:
           raise
   ```
   
   @HyukjinKwon Maybe should we add some flag or something here to enable running test without `pandas` and `pyarrow`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on PR #44778:
URL: https://github.com/apache/spark/pull/44778#issuecomment-1911090919

   Would you please rebase master for https://github.com/apache/spark/pull/44886? Then we could skip the proposed changes related to "have_grpcio". Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1462873767


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   can we remove `PyPy` here, and make the tests skipped properly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464191332


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Hmm... it seems like actually `assertDataFrameEqual` doesn't depend on pandas??
   
   ```python
   >>> import pandas
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   ModuleNotFoundError: No module named 'pandas'
   >>> from pyspark.testing.utils import assertDataFrameEqual
   >>> df1 = spark.range(10)
   >>> df2 = spark.range(10)
   >>> assertDataFrameEqual(df1, df2)
   ```
   
   It works well without pandas. In the code, we skip using pandas if it's not installed:
   
   ```python
       has_pandas = False
       try:
           # If pandas dependencies are available, allow pandas or pandas-on-Spark DataFrame
           import pyspark.pandas as ps
           import pandas as pd
           from pyspark.testing.pandasutils import PandasOnSparkTestUtils
   
           has_pandas = True
       except ImportError:
           # no pandas, so we won't call pandasutils functions
           pass
   ```
   
   > After testing, we found that the assertDataFrameEqual  method used in this UT requires it.
   
   Did you test with uninstalling `pandas` clearly? we should uninstall `pandas-stubs` as well before uninstalling `pandas`. See https://github.com/apache/spark/pull/44745#issuecomment-1894850179 more detail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464191868


##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -69,6 +69,9 @@
 )
 
 
+@unittest.skipIf(
+    not have_pandas or not have_pyarrow, pandas_requirement_message or pyarrow_requirement_message
+)
 class BaseUDTFTestsMixin:

Review Comment:
   ~~Could you check https://github.com/apache/spark/pull/44778#discussion_r1464191332? Seems like assertDataFrameEqual does not depend on pandas~~
   
   nvm, we just merged the PR to fix it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1466909038


##########
python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py:
##########
@@ -1320,6 +1322,7 @@ def f4_iter(it):
             self.assertEqual(expected_multi, df_multi_1.collect())
             self.assertEqual(expected_multi, df_multi_2.collect())
 
+    @unittest.skipIf(not have_grpcio, grpcio_requirement_message)

Review Comment:
   Adjusted in https://github.com/apache/spark/pull/44886.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1467348900


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -935,7 +940,8 @@ def test_udf(a, b):
             ]
         ):
             with self.subTest(query_no=i):
-                assertDataFrameEqual(df, [Row(0), Row(101)])
+                if have_pyarrow:
+                    assertDataFrameEqual(df, [Row(0), Row(101)])

Review Comment:
   why do we need arrow for compating DataFrame and Rows @itholic ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1467378305


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -935,7 +940,8 @@ def test_udf(a, b):
             ]
         ):
             with self.subTest(query_no=i):
-                assertDataFrameEqual(df, [Row(0), Row(101)])
+                if have_pyarrow:
+                    assertDataFrameEqual(df, [Row(0), Row(101)])

Review Comment:
   > Can we avoid it since we don't need it when we compare PySpark DataFrame and PySpark Rows?
   
   Yes, on second thought we can just remove `pyspark.pandas` dependency from `assertDataFrameEqual`.
   
   Just submitted PR: https://github.com/apache/spark/pull/44899.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1458388111


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   I have added `# doctest: +SKIP` to all the doctests using method `assertDataFrameEqual` in `pyspark.testing.utils`, and based on the current GA results, they have all passed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][DOCS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1456949739


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Okay, let me check again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1469235826


##########
python/pyspark/testing/connectutils.py:
##########
@@ -89,7 +89,8 @@ def __getattr__(self, item):
 
 @unittest.skipIf(not should_test_connect, connect_requirement_message)
 class PlanOnlyTestFixture(unittest.TestCase, PySparkErrorTestUtils):
-    from pyspark.sql.connect.dataframe import DataFrame
+    if should_test_connect:
+        from pyspark.sql.connect.dataframe import DataFrame

Review Comment:
   https://github.com/panbingkun/spark/actions/runs/7692459974/job/20959442644#step:12:3887
   <img width="1002" alt="image" src="https://github.com/apache/spark/assets/15246973/c3b1aa88-1942-4ffa-a9a1-f53ce34a8979">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1457628763


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   Well, I'd prefer to handle these missing packages in the test cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1457349381


##########
.github/workflows/build_and_test.yml:
##########
@@ -367,7 +367,7 @@ jobs:
             pyspark-pandas-connect-part3
     env:
       MODULES_TO_TEST: ${{ matrix.modules }}
-      PYTHON_TO_TEST: 'python3.9'
+      PYTHON_TO_TEST: 'pypy3'

Review Comment:
   After obtaining `Approval`, I will restore this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1463166644


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   @HyukjinKwon `Pyspark tests` have passed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470548114


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -1039,13 +1045,15 @@ def test_udf(a):
         with self.assertRaisesRegex(PythonException, "StopIteration"):
             self.spark.range(10).select(test_udf(col("id"))).show()
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_python_udf_segfault(self):
         with self.sql_conf({"spark.sql.execution.pyspark.udf.faulthandler.enabled": True}):
             with self.assertRaisesRegex(Exception, "Segmentation fault"):
                 import ctypes
 
                 self.spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()

Review Comment:
   In the afternoon, I will reproduce the detailed information of this error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470548209


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -45,9 +45,14 @@
     IntegerType,
     BooleanType,
 )
-from pyspark.testing.sqlutils import have_pandas
+from pyspark.testing.sqlutils import (
+    have_pandas,
+    have_pyarrow,
+    pyarrow_requirement_message,
+)
 
 
+@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470582255


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -969,6 +984,7 @@ def test_assert_error_pandas_pyspark_df(self):
             },
         )
 
+    @unittest.skipIf(not have_pandas or not have_pyarrow, "no pandas or pyarrow dependency")
     def test_assert_error_non_pyspark_df(self):

Review Comment:
   I believe we can remove `@unittest.skipIf(not have_pandas or not have_pyarrow, "no pandas or pyarrow dependency")` as `assertDataFrameEqual` no more requires `pandas` and `pyarrow`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470551225


##########
python/pyspark/testing/connectutils.py:
##########
@@ -89,7 +89,8 @@ def __getattr__(self, item):
 
 @unittest.skipIf(not should_test_connect, connect_requirement_message)
 class PlanOnlyTestFixture(unittest.TestCase, PySparkErrorTestUtils):
-    from pyspark.sql.connect.dataframe import DataFrame
+    if should_test_connect:
+        from pyspark.sql.connect.dataframe import DataFrame

Review Comment:
   Okay.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470549974


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -969,6 +984,7 @@ def test_assert_error_pandas_pyspark_df(self):
             },
         )
 
+    @unittest.skipIf(not have_pandas or not have_pyarrow, "no pandas or pyarrow dependency")
     def test_assert_error_non_pyspark_df(self):

Review Comment:
   why does this one need arrow and pandas?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1467358772


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -935,7 +940,8 @@ def test_udf(a, b):
             ]
         ):
             with self.subTest(query_no=i):
-                assertDataFrameEqual(df, [Row(0), Row(101)])
+                if have_pyarrow:
+                    assertDataFrameEqual(df, [Row(0), Row(101)])

Review Comment:
   because `assertDataFrameEqual` [try importing pyspark.pandas](https://github.com/apache/spark/blob/master/python/pyspark/testing/utils.py#L758-L768) which requires `arrow` and `pandas`
   ```
   ```



##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -935,7 +940,8 @@ def test_udf(a, b):
             ]
         ):
             with self.subTest(query_no=i):
-                assertDataFrameEqual(df, [Row(0), Row(101)])
+                if have_pyarrow:
+                    assertDataFrameEqual(df, [Row(0), Row(101)])

Review Comment:
   because `assertDataFrameEqual` [try importing pyspark.pandas](https://github.com/apache/spark/blob/master/python/pyspark/testing/utils.py#L758-L768) which requires `arrow` and `pandas`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1468183382


##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -247,7 +247,8 @@ class TestUDTF:
             def eval(self, a: int):
                 yield
 
-        assertDataFrameEqual(TestUDTF(lit(1)), [Row(a=None)])
+        if have_pyarrow:
+            assertDataFrameEqual(TestUDTF(lit(1)), [Row(a=None)])

Review Comment:
   It's not UDTF, it's this `assertDataFrameEqual` requires Arrow: https://github.com/apache/spark/pull/44778/files#r1467378305
   This should be fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1469112249


##########
python/pyspark/sql/tests/test_udf_profiler.py:
##########
@@ -51,6 +51,7 @@ def add2(x):
     action(df)
 
 
+@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)

Review Comment:
   cc @xinrong-meng Mind double checking? It seems weird that it needs Arrow here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1458179223


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   The reason why this place is different from other places is that there are some `doctests` in `pyspark.testing.utils`. When I tried adding `@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)` to the main method, it didn't seem to work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464319314


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   yeah we can do



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "xinrong-meng (via GitHub)" <gi...@apache.org>.
xinrong-meng commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1468081211


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -935,7 +940,8 @@ def test_udf(a, b):
             ]
         ):
             with self.subTest(query_no=i):
-                assertDataFrameEqual(df, [Row(0), Row(101)])
+                if have_pyarrow:
+                    assertDataFrameEqual(df, [Row(0), Row(101)])

Review Comment:
   May I confirm why we need `pyarrow`?
   
   ```py
   >>> import pyarrow
   ...
   ModuleNotFoundError: No module named 'pyarrow'
   >>> df = spark.createDataFrame(data=[("1",), ("2",)])
   >>> assertDataFrameEqual(df, [Row(_1="1"), Row(_1="2")])
   >>> 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464133260


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -45,9 +45,14 @@
     IntegerType,
     BooleanType,
 )
-from pyspark.testing.sqlutils import have_pandas
+from pyspark.testing.sqlutils import (
+    have_pandas,
+    have_pyarrow,
+    pyarrow_requirement_message,
+)
 
 
+@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
 class UtilsTestsMixin:

Review Comment:
   why does this require pyarrow and pandas? There's something wrong in the code base now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464198106


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -45,9 +45,14 @@
     IntegerType,
     BooleanType,
 )
-from pyspark.testing.sqlutils import have_pandas
+from pyspark.testing.sqlutils import (
+    have_pandas,
+    have_pyarrow,
+    pyarrow_requirement_message,
+)
 
 
+@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
 class UtilsTestsMixin:

Review Comment:
   Hmm,, I don't think `assertDataFrameEqual` uses `pyarrow`?? It uses `pandas`, but it's not required package because we just skip using pandas when it's not installed. Could check [comment](https://github.com/apache/spark/pull/44778#discussion_r1464195098) above?



##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -45,9 +45,14 @@
     IntegerType,
     BooleanType,
 )
-from pyspark.testing.sqlutils import have_pandas
+from pyspark.testing.sqlutils import (
+    have_pandas,
+    have_pyarrow,
+    pyarrow_requirement_message,
+)
 
 
+@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
 class UtilsTestsMixin:

Review Comment:
   Hmm,, I don't think `assertDataFrameEqual` uses `pyarrow`?? It uses `pandas` instead, but it's not required package because we just skip using pandas when it's not installed. Could check [comment](https://github.com/apache/spark/pull/44778#discussion_r1464195098) above?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464195098


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   We just merged https://github.com/apache/spark/pull/44745 that bandaids for this issue. Could you try testing again after rebasing master?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464196615


##########
python/pyspark/sql/tests/test_udtf.py:
##########
@@ -69,6 +69,9 @@
 )
 
 
+@unittest.skipIf(
+    not have_pandas or not have_pyarrow, pandas_requirement_message or pyarrow_requirement_message
+)
 class BaseUDTFTestsMixin:

Review Comment:
   Yeah.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464350475


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Made a quick fix here: https://github.com/apache/spark/pull/44864



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470433726


##########
python/pyspark/sql/tests/test_udf_profiler.py:
##########
@@ -51,6 +51,7 @@ def add2(x):
     action(df)
 
 
+@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)

Review Comment:
   Okay



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470550502


##########
python/pyspark/testing/connectutils.py:
##########
@@ -89,7 +89,8 @@ def __getattr__(self, item):
 
 @unittest.skipIf(not should_test_connect, connect_requirement_message)
 class PlanOnlyTestFixture(unittest.TestCase, PySparkErrorTestUtils):
-    from pyspark.sql.connect.dataframe import DataFrame
+    if should_test_connect:
+        from pyspark.sql.connect.dataframe import DataFrame

Review Comment:
   You can move this to the top to L67
   
   ```
   if should_test_connect:
       from pyspark.sql.connect.dataframe import DataFrame
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470550106


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -969,6 +984,7 @@ def test_assert_error_pandas_pyspark_df(self):
             },
         )
 
+    @unittest.skipIf(not have_pandas or not have_pyarrow, "no pandas or pyarrow dependency")
     def test_assert_error_non_pyspark_df(self):

Review Comment:
   @itholic 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1471004401


##########
python/pyspark/testing/connectutils.py:
##########
@@ -89,7 +89,8 @@ def __getattr__(self, item):
 
 @unittest.skipIf(not should_test_connect, connect_requirement_message)
 class PlanOnlyTestFixture(unittest.TestCase, PySparkErrorTestUtils):
-    from pyspark.sql.connect.dataframe import DataFrame
+    if should_test_connect:
+        from pyspark.sql.connect.dataframe import DataFrame

Review Comment:
   When only moving this line of code, an error seems to have occurred.
   https://github.com/panbingkun/spark/actions/runs/7708511363/job/21009054376#step:12:3888
   <img width="771" alt="image" src="https://github.com/apache/spark/assets/15246973/56ec38b5-9ecd-4c3a-ba3d-3e20534e9a8f">
   
   The previous version's modification was to move everything under the `if should_test_connect:` logic
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464186119


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   FYI, in `assertDataFrameEqual` we just skip using pandas if it's not installed, but somehow it is not working properly for now. 
   
   ```python
       has_pandas = False
       try:
           # If pandas dependencies are available, allow pandas or pandas-on-Spark DataFrame
           import pyspark.pandas as ps
           import pandas as pd
           from pyspark.testing.pandasutils import PandasOnSparkTestUtils
   
           has_pandas = True
       except ImportError:
           # no pandas, so we won't call pandasutils functions
           pass
   ```
   
   Anyway, I just create ticket SPARK-46821 to resolve this. I'm working on it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464195098


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   We just merged https://github.com/apache/spark/pull/44745 that bandaids for pandas uninstalling issue. Could you try testing again after rebasing master?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464266537


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Just realized that we're checking `pandas` and `pyarrow` when importing `pyspark.pandas` as below, and exit the program when it's not installed:
   
   ```python
   # pyspark/pandas/__init__.py
   try:
       require_minimum_pandas_version()
       require_minimum_pyarrow_version()
   except ImportError as e:
       if os.environ.get("SPARK_TESTING"):
           warnings.warn(str(e))
           sys.exit()
       else:
           raise
   ```
   
   @HyukjinKwon Maybe should we add some flag or something here to enable running test without `pandas` and `pyarrow` within specific system?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464264570


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Oh.... seems like we got another problem that we try import `pyspark.pandas` in `assertDataFrameEqual`. Let me try to made a quick fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1467397361


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -45,7 +45,14 @@
     IntegerType,
     BooleanType,
 )
-from pyspark.testing.sqlutils import have_pandas
+from pyspark.testing.sqlutils import (
+    have_pandas,
+    have_pyarrow,
+    pyarrow_requirement_message,
+)
+
+
+unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)

Review Comment:
   Okay.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1458186175


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   What about using `# doctest: +SKIP`? The functions in `pyspark.testing.utils` are also have their corresponding UTs, so I think it should be fine to skip the doctest.
   
   > these are optional dependencies.
   
   Skipping whole tests due to missing of optional dependencies sounds a bit unreasonable to me as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1469111152


##########
python/pyspark/sql/tests/test_session.py:
##########
@@ -213,6 +217,7 @@ def test_active_session_with_None_and_not_None_context(self):
             if sc is not None:
                 sc.stop()
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)

Review Comment:
   Let's fix it as
   
   ```python
   @unittest.skipIf(not should_test_connect, connect_requirement_message)
   ```
   
   See `pyspark.testing.connectutils`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470712670


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -969,6 +984,7 @@ def test_assert_error_pandas_pyspark_df(self):
             },
         )
 
+    @unittest.skipIf(not have_pandas or not have_pyarrow, "no pandas or pyarrow dependency")
     def test_assert_error_non_pyspark_df(self):

Review Comment:
   https://github.com/apache/spark/blob/4da7c3d316e3d1340258698e841be370bd16d6fa/python/pyspark/sql/tests/test_utils.py#L977
   https://github.com/apache/spark/blob/4da7c3d316e3d1340258698e841be370bd16d6fa/python/pyspark/testing/utils.py#L923



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470769491


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -1039,13 +1045,15 @@ def test_udf(a):
         with self.assertRaisesRegex(PythonException, "StopIteration"):
             self.spark.range(10).select(test_udf(col("id"))).show()
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_python_udf_segfault(self):
         with self.sql_conf({"spark.sql.execution.pyspark.udf.faulthandler.enabled": True}):
             with self.assertRaisesRegex(Exception, "Segmentation fault"):
                 import ctypes
 
                 self.spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()

Review Comment:
   The detailed information about the issue is:
   https://github.com/panbingkun/spark/actions/runs/7707535613/job/21004937347#step:12:4298
   <img width="796" alt="image" src="https://github.com/apache/spark/assets/15246973/d885a462-4faf-4e5d-ad94-df60966fcfd2">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1457628763


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   Well, I'd prefer to handle these missing packages in the test cases because these are `optional` dependencies.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #44778:
URL: https://github.com/apache/spark/pull/44778#issuecomment-1911269936

   > Would you please rebase master for #44886? Then we could skip the proposed changes related to "have_grpcio". Thanks!
   
   Sure, let me do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1467362254


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -935,7 +940,8 @@ def test_udf(a, b):
             ]
         ):
             with self.subTest(query_no=i):
-                assertDataFrameEqual(df, [Row(0), Row(101)])
+                if have_pyarrow:
+                    assertDataFrameEqual(df, [Row(0), Row(101)])

Review Comment:
   Can we avoid it since we don't need it when we compare PySpark DataFrame and PySpark Rows?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464258802


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   @itholic From the test, it seems that there is still a problem:
   https://github.com/panbingkun/spark/actions/runs/7634418170/job/20798310295#step:12:4443
   <img width="998" alt="image" src="https://github.com/apache/spark/assets/15246973/bbbb52e0-2ed9-4694-8b35-0d4fd848b8e7">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464186119


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   FYI, in `assertDataFrameEqual` we just skip using pandas if it's not installed, but somehow it is not working properly for now. 
   
   ```python
       has_pandas = False
       try:
           # If pandas dependencies are available, allow pandas or pandas-on-Spark DataFrame
           import pyspark.pandas as ps
           import pandas as pd
           from pyspark.testing.pandasutils import PandasOnSparkTestUtils
   
           has_pandas = True
       except ImportError:
           # no pandas, so we won't call pandasutils functions
           pass
   ```
   
   Anyway, I just create ticket SPARK-46821 to resolve this. I'm working on it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464180610


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Let me take a look. Do we want to fix it separately after this PR merging or before merging??



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464196362


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -45,9 +45,14 @@
     IntegerType,
     BooleanType,
 )
-from pyspark.testing.sqlutils import have_pandas
+from pyspark.testing.sqlutils import (
+    have_pandas,
+    have_pyarrow,
+    pyarrow_requirement_message,
+)
 
 
+@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
 class UtilsTestsMixin:

Review Comment:
   Because the method `assertDataFrameEqual` is used inside, it requires `pyarrow`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464199835


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -45,9 +45,14 @@
     IntegerType,
     BooleanType,
 )
-from pyspark.testing.sqlutils import have_pandas
+from pyspark.testing.sqlutils import (
+    have_pandas,
+    have_pyarrow,
+    pyarrow_requirement_message,
+)
 
 
+@unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
 class UtilsTestsMixin:

Review Comment:
   Okay



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464264570


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   Oh.... seems like we got another problem that we also try import `pyspark.pandas` in `assertDataFrameEqual` not only `pandas`. Let me try to made a quick fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464133056


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
         self.assertEqual(row[1], {"a": "b"})
         self.assertEqual(row[2], Row(col1=1, col2=2))
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_named_arguments(self):

Review Comment:
   @itholic can you make this test not requiring that method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44778:
URL: https://github.com/apache/spark/pull/44778#issuecomment-1916014430

   Otherwise looks pretty good. Thanks for driving this @panbingkun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1470704668


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -969,6 +984,7 @@ def test_assert_error_pandas_pyspark_df(self):
             },
         )
 
+    @unittest.skipIf(not have_pandas or not have_pyarrow, "no pandas or pyarrow dependency")
     def test_assert_error_non_pyspark_df(self):

Review Comment:
   The detailed information about the issue is:
   <img width="858" alt="image" src="https://github.com/apache/spark/assets/15246973/30926a97-5384-429b-b9dc-8edfdb0d6808">
   https://github.com/panbingkun/spark/actions/runs/7695647024/job/20969046665#step:12:4274



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1469112025


##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -1039,13 +1045,15 @@ def test_udf(a):
         with self.assertRaisesRegex(PythonException, "StopIteration"):
             self.spark.range(10).select(test_udf(col("id"))).show()
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_python_udf_segfault(self):
         with self.sql_conf({"spark.sql.execution.pyspark.udf.faulthandler.enabled": True}):
             with self.assertRaisesRegex(Exception, "Segmentation fault"):
                 import ctypes
 
                 self.spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()
 
+    @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
     def test_err_udf_init(self):
         with QuietTest(self.sc):
             self.check_err_udf_init()

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #44778:
URL: https://github.com/apache/spark/pull/44778#issuecomment-1898357361

   cc @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1458186175


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   What about using `# doctest: +SKIP`? The functions in `pyspark.testing.utils` are also have their corresponding UTs, so I think it should be fine to skip the doctest.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46753][PYTHON][TESTS] Fix pypy3 python test [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1458550594


##########
dev/sparktestsupport/modules.py:
##########
@@ -542,6 +542,10 @@ def __hash__(self):
         "pyspark.testing.utils",
         "pyspark.testing.pandasutils",
     ],
+    excluded_python_implementations=[
+        "PyPy"  # Skip these tests under PyPy since they require numpy, pandas, and pyarrow and

Review Comment:
   Cool. Looks good to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org