You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/08/08 04:40:01 UTC

[GitHub] [spark] itholic opened a new pull request, #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

itholic opened a new pull request, #42388:
URL: https://github.com/apache/spark/pull/42388

   ### What changes were proposed in this pull request?
   
   This PR proposes to enable tests for pandas API on Spark with Spark Connect
   
   
   ### Why are the changes needed?
   
   To increate test coverage
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, it's test-only.
   
   ### How was this patch tested?
   
   Enabling the existing UTs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42388:
URL: https://github.com/apache/spark/pull/42388#issuecomment-1681484234

   merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests
URL: https://github.com/apache/spark/pull/42388


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #42388:
URL: https://github.com/apache/spark/pull/42388#discussion_r1286598403


##########
python/pyspark/pandas/tests/computation/test_compute.py:
##########
@@ -101,16 +101,9 @@ def test_mode(self):
         with self.assertRaises(ValueError):
             psdf.mode(axis=2)
 
-        def f(index, iterator):

Review Comment:
   on second thought, I suggest splitting `test_mode` and then always skip this part



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #42388:
URL: https://github.com/apache/spark/pull/42388#issuecomment-1676515460

   I don't see any relevant log for the test failure from result as below:
   ```
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
     test_series_iloc_setitem (pyspark.pandas.tests.connect.diff_frames_ops.test_parity_setitem_series.DiffFramesParitySetItemSeriesTests) ... 
   [Stage 0:>                                                          (0 + 2) / 2]
   
                                                                                   
   /__w/spark/spark/python/pyspark/pandas/utils.py:1016: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   /__w/spark/spark/python/pyspark/pandas/utils.py:1016: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   /__w/spark/spark/python/pyspark/pandas/utils.py:1016: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   /__w/spark/spark/python/pyspark/pandas/utils.py:1016: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   /__w/spark/spark/python/pyspark/pandas/utils.py:1016: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas DataFrame is expected to be small.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   /__w/spark/spark/python/pyspark/pandas/utils.py:1016: PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's memory. It should only be used if the resulting pandas Series is expected to be small.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   
   Had test failures in pyspark.pandas.tests.connect.diff_frames_ops.test_parity_setitem_series with python3.9; see logs.
   
   ```
   
   But I roughly suspect that it's failing with timeout (it's even succeed from my local testing).
   
   Let me try to separate the test and see if it pass.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #42388:
URL: https://github.com/apache/spark/pull/42388#discussion_r1286597113


##########
python/pyspark/pandas/tests/computation/test_compute.py:
##########
@@ -101,16 +101,9 @@ def test_mode(self):
         with self.assertRaises(ValueError):
             psdf.mode(axis=2)
 
-        def f(index, iterator):

Review Comment:
   IIRC, this one is by design. Using the specific partitioning to test a `mode` bug, see https://github.com/apache/spark/pull/38385/files
   
   but I'm fine to remove this one now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #42388:
URL: https://github.com/apache/spark/pull/42388#issuecomment-1674472191

   @itholic would you mind checking whether the test failure is relevant?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42388:
URL: https://github.com/apache/spark/pull/42388#discussion_r1295448791


##########
python/pyspark/pandas/tests/connect/diff_frames_ops/test_parity_setitem_series.py:
##########
@@ -24,9 +24,7 @@
 class DiffFramesParitySetItemSeriesTests(
     DiffFramesSetItemSeriesMixin, PandasOnSparkTestUtils, ReusedConnectTestCase
 ):
-    @unittest.skip(
-        "TODO(SPARK-43658): Fix unexpected `SparkConnectGrpcException` from Spark Connect client."
-    )
+    @unittest.skip("TODO(SPARK-44826): Resolve testing timeout issue from Spark Connect.")

Review Comment:
   Seems like we can't just enabling this one because this only fails on Spark Connect due to timeout issue for some reason.
   Addressing this one as a separate issue looks make sense to me, so I created new ticket and updated the comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #42388:
URL: https://github.com/apache/spark/pull/42388#discussion_r1286597113


##########
python/pyspark/pandas/tests/computation/test_compute.py:
##########
@@ -101,16 +101,9 @@ def test_mode(self):
         with self.assertRaises(ValueError):
             psdf.mode(axis=2)
 
-        def f(index, iterator):

Review Comment:
   IIRC, this one is by design. Using the specific partitioning to test a `mode` bug, see https://github.com/apache/spark/pull/38385/files
   
   ~~but I'm fine to remove this one now~~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #42388:
URL: https://github.com/apache/spark/pull/42388#issuecomment-1668907825

   cc @zhengruifeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42388:
URL: https://github.com/apache/spark/pull/42388#discussion_r1295448791


##########
python/pyspark/pandas/tests/connect/diff_frames_ops/test_parity_setitem_series.py:
##########
@@ -24,9 +24,7 @@
 class DiffFramesParitySetItemSeriesTests(
     DiffFramesSetItemSeriesMixin, PandasOnSparkTestUtils, ReusedConnectTestCase
 ):
-    @unittest.skip(
-        "TODO(SPARK-43658): Fix unexpected `SparkConnectGrpcException` from Spark Connect client."
-    )
+    @unittest.skip("TODO(SPARK-44826): Resolve testing timeout issue from Spark Connect.")

Review Comment:
   After trying in several way to enable this one, but seems like we can't just enabling because this keep failing on (only from) Spark Connect due to timeout issue for some reason.
   
   Addressing this one as a separate issue looks make sense to me, so I created new ticket and updated the comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42388:
URL: https://github.com/apache/spark/pull/42388#discussion_r1286623971


##########
python/pyspark/pandas/tests/computation/test_compute.py:
##########
@@ -101,16 +101,9 @@ def test_mode(self):
         with self.assertRaises(ValueError):
             psdf.mode(axis=2)
 
-        def f(index, iterator):

Review Comment:
   Thanks for the context! Just updated the tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #42388:
URL: https://github.com/apache/spark/pull/42388#discussion_r1286597113


##########
python/pyspark/pandas/tests/computation/test_compute.py:
##########
@@ -101,16 +101,9 @@ def test_mode(self):
         with self.assertRaises(ValueError):
             psdf.mode(axis=2)
 
-        def f(index, iterator):

Review Comment:
   IIRC, this one is by design. Using the specific partitioning to test a mode bug, see https://github.com/apache/spark/pull/38385/files
   
   but I'm fine to remove this one now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org