You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "juliuszsompolski (via GitHub)" <gi...@apache.org> on 2023/08/16 22:09:16 UTC

[GitHub] [spark] juliuszsompolski opened a new pull request, #42523: [SPARK-44806][CONNECT] Move internal client spark-connect-common to be able to test real in-process server with a real RPC client

juliuszsompolski opened a new pull request, #42523:
URL: https://github.com/apache/spark/pull/42523

   ### What changes were proposed in this pull request?
   
   Currently, Spark Connect has the following types of integration tests:
   * Client unit tests with a mock in process server (DummySparkConnectService with GRPC InProcessServerBuilder)
   * Client unit tests with a mock RPC server (DummySparkConnectService with GRPC NettyServerBuilder)
   * Server unit tests with in-process server and using various mocks
   * E2E tests of real client with a server started in another process (using RemoteSparkSession)
   
   What is lacking are E2E tests with an in-process Server (so that server state can be inspected asserted), and a real RPC client. This is impossible, because classes from `spark-connect-client-jvm` module include the client API which duplicates Spark SQL APIs of SparkSession, Dataset etc. When trying to pull a real client into the server module for testing, these classes clash.
   
   Move the `org.apache.spark.sql.connect.client` code into new `spark-connect-client-jvm-internal` module, so that the internal SparkConnectClient code is separated from the client public API, and can be pulled into testing of the server.
   
   Tried alternative approach in https://github.com/apache/spark/pull/42465. That doesn't work, because it also reorders the maven build in a way so that client is build before server, but client actually requires server to be build first to use tests with `RemoteSparkSession`
   Tried alternative approach to depend on a shaded/relocated version of client in https://github.com/apache/spark/pull/42461, but that's just not possible to do neither in maven nor sbt.
   Tried alternative approach to create client-jvm-internal module in https://github.com/apache/spark/pull/42441, moving things to connect-common was preferred to introducing new module by reviewers.
   Moved it together with tests in https://github.com/apache/spark/pull/42501, but moving tests isn't really needed.
   
   ### Why are the changes needed?
   
   For being able to use the internal client for testing of in-process server with an in-process client, but communicating over real RPC.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   All modules test and build.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell closed pull request #42523: [SPARK-44806][CONNECT] Move internal client spark-connect-common to be able to test real in-process server with a real RPC client

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #42523: [SPARK-44806][CONNECT] Move internal client spark-connect-common to be able to test real in-process server with a real RPC client
URL: https://github.com/apache/spark/pull/42523


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #42523: [SPARK-44806][CONNECT] Move internal client spark-connect-common to be able to test real in-process server with a real RPC client

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #42523:
URL: https://github.com/apache/spark/pull/42523#issuecomment-1681849488

   > Move the org.apache.spark.sql.connect.client code into new spark-connect-client-jvm-internal module, so that the internal SparkConnectClient code is separated from the client public API, and can be pulled into testing of the server.
   
   Should this part of the PR description be fixed? Move the org.apache.spark.sql.connect.client code into spark-connect-common module?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42523: [SPARK-44806][CONNECT] Move internal client spark-connect-common to be able to test real in-process server with a real RPC client

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42523:
URL: https://github.com/apache/spark/pull/42523#issuecomment-1681874621

   > > Move the org.apache.spark.sql.connect.client code into new spark-connect-client-jvm-internal module, so that the internal SparkConnectClient code is separated from the client public API, and can be pulled into testing of the server.
   > 
   > Should this part of the PR description be fixed? Move the org.apache.spark.sql.connect.client code into spark-connect-common module?
   
   Thanks. Updated. Too much copying between too many PRs :-).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42523: [SPARK-44806][CONNECT] Move internal client spark-connect-common to be able to test real in-process server with a real RPC client

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42523:
URL: https://github.com/apache/spark/pull/42523#issuecomment-1682420456

   The previous CI https://pipelines.actions.githubusercontent.com/serviceHosts/3d13f997-a4d3-4f0e-b2e1-03759b595f61/_apis/pipelines/1/runs/12740/signedlogcontent/115?urlExpires=2023-08-17T14%3A47%3A10.2762629Z&urlSigningMethod=HMACV1&urlSignature=XuslFiQjNtNiOPbVnTmPasCN81B1G4pV2xRvb6edZPw%3D
   hung in
   ```
   2023-08-17T09:32:20.8263039Z Starting test(python3.9): pyspark.mllib.tests.test_algorithms (temp output: /__w/apache-spark/apache-spark/python/target/14cf7d84-bc85-4720-bdff-cb05b1f1691e/python3.9__pyspark.mllib.tests.test_algorithms__m5lzg53x.log)
   2023-08-17T14:43:37.3919476Z ##[error]The operation was canceled.
   2023-08-17T14:43:37.4098177Z ##[group]Run actions/upload-artifact@v3
   ```
   
   Connect tests have finished successfully above that:
   ```
   2023-08-17T09:03:41.3914976Z Starting test(python3.9): pyspark.ml.tests.connect.test_connect_classification (temp output: /__w/apache-spark/apache-spark/python/target/6dc9919c-af23-462f-ab49-0d9d6fbe2d68/python3.9__pyspark.ml.tests.connect.test_connect_classification__9px520ti.log)
   2023-08-17T09:04:45.6838607Z Finished test(python3.9): pyspark.ml.tests.connect.test_connect_classification (64s)
   2023-08-17T09:04:45.6848437Z Starting test(python3.9): pyspark.ml.tests.connect.test_connect_evaluation (temp output: /__w/apache-spark/apache-spark/python/target/ce515363-6b4d-4320-82c8-06142300aa81/python3.9__pyspark.ml.tests.connect.test_connect_evaluation__wt_qdeso.log)
   2023-08-17T09:05:09.4523836Z Finished test(python3.9): pyspark.ml.tests.connect.test_connect_evaluation (23s)
   2023-08-17T09:05:09.4539745Z Starting test(python3.9): pyspark.ml.tests.connect.test_connect_feature (temp output: /__w/apache-spark/apache-spark/python/target/d2e8c01d-cdb7-447b-99a6-627a0fce8299/python3.9__pyspark.ml.tests.connect.test_connect_feature__16us3pls.log)
   2023-08-17T09:05:26.1468648Z Finished test(python3.9): pyspark.ml.tests.connect.test_connect_feature (16s)
   2023-08-17T09:05:26.1479330Z Starting test(python3.9): pyspark.ml.tests.connect.test_connect_function (temp output: /__w/apache-spark/apache-spark/python/target/f3c64a5a-a7f8-4f40-ada3-47cb6ea0c611/python3.9__pyspark.ml.tests.connect.test_connect_function__bu50hu2h.log)
   2023-08-17T09:05:39.3850188Z Finished test(python3.9): pyspark.ml.tests.connect.test_connect_function (13s)
   2023-08-17T09:05:39.3864540Z Starting test(python3.9): pyspark.ml.tests.connect.test_connect_pipeline (temp output: /__w/apache-spark/apache-spark/python/target/8e7a012a-6d3b-45c7-908d-391591b335b4/python3.9__pyspark.ml.tests.connect.test_connect_pipeline__3yrkhzme.log)
   2023-08-17T09:06:24.4221834Z Finished test(python3.9): pyspark.ml.tests.connect.test_connect_pipeline (45s)
   2023-08-17T09:06:24.4262020Z Starting test(python3.9): pyspark.ml.tests.connect.test_connect_summarizer (temp output: /__w/apache-spark/apache-spark/python/target/485548ab-7b34-4a5b-bfaa-858cbee40806/python3.9__pyspark.ml.tests.connect.test_connect_summarizer__0v0nbt0v.log)
   2023-08-17T09:06:37.5242046Z Finished test(python3.9): pyspark.ml.tests.connect.test_connect_summarizer (13s)
   2023-08-17T09:06:37.5253105Z Starting test(python3.9): pyspark.ml.tests.connect.test_connect_tuning (temp output: /__w/apache-spark/apache-spark/python/target/80b15a9a-3144-4ad8-b751-35b14acfd4d1/python3.9__pyspark.ml.tests.connect.test_connect_tuning__9v0aw2p4.log)
   2023-08-17T09:08:38.0268219Z Finished test(python3.9): pyspark.ml.tests.connect.test_connect_tuning (120s)
   2023-08-17T09:08:38.0280513Z Starting test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_classification (temp output: /__w/apache-spark/apache-spark/python/target/41b22175-04ed-47e7-9b19-51fcc6719cdf/python3.9__pyspark.ml.tests.connect.test_legacy_mode_classification__3_tavg70.log)
   2023-08-17T09:09:41.6896481Z Finished test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_classification (63s)
   2023-08-17T09:09:41.6907884Z Starting test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_evaluation (temp output: /__w/apache-spark/apache-spark/python/target/0bdf6c57-1ef3-4b55-9811-dde4561d8716/python3.9__pyspark.ml.tests.connect.test_legacy_mode_evaluation__y4_8ws6c.log)
   2023-08-17T09:10:07.9737213Z Finished test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_evaluation (26s)
   2023-08-17T09:10:07.9751519Z Starting test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_feature (temp output: /__w/apache-spark/apache-spark/python/target/32ffa486-c9c1-4b67-9df8-158bbadec037/python3.9__pyspark.ml.tests.connect.test_legacy_mode_feature__u4ph1xtm.log)
   2023-08-17T09:10:23.6414934Z Finished test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_feature (15s)
   2023-08-17T09:10:23.6425578Z Starting test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_pipeline (temp output: /__w/apache-spark/apache-spark/python/target/3da44b68-92dd-4c39-9f50-e475f26a1077/python3.9__pyspark.ml.tests.connect.test_legacy_mode_pipeline__13ax6wd6.log)
   2023-08-17T09:11:07.3254602Z Finished test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_pipeline (43s)
   2023-08-17T09:11:07.3268238Z Starting test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_summarizer (temp output: /__w/apache-spark/apache-spark/python/target/7e9fef28-e2d9-4594-b21e-183ba3433e9a/python3.9__pyspark.ml.tests.connect.test_legacy_mode_summarizer__qf3td1fg.log)
   2023-08-17T09:11:19.3500145Z Finished test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_summarizer (12s)
   2023-08-17T09:11:19.3511052Z Starting test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_tuning (temp output: /__w/apache-spark/apache-spark/python/target/aa9b1a9f-0b8f-4d77-b60d-351925c94744/python3.9__pyspark.ml.tests.connect.test_legacy_mode_tuning__a2oeb58g.log)
   2023-08-17T09:13:18.4455119Z Finished test(python3.9): pyspark.ml.tests.connect.test_legacy_mode_tuning (119s)
   2023-08-17T09:13:18.4468491Z Starting test(python3.9): pyspark.ml.tests.connect.test_parity_torch_data_loader (temp output: /__w/apache-spark/apache-spark/python/target/d6544899-0640-4a55-a641-21b3f2860cd1/python3.9__pyspark.ml.tests.connect.test_parity_torch_data_loader__828uhmrs.log)
   2023-08-17T09:14:34.5115066Z Finished test(python3.9): pyspark.ml.tests.connect.test_parity_torch_data_loader (76s)
   2023-08-17T09:14:34.5128337Z Starting test(python3.9): pyspark.ml.tests.connect.test_parity_torch_distributor (temp output: /__w/apache-spark/apache-spark/python/target/6cb2c03e-202a-4915-8f87-670a6e5112f1/python3.9__pyspark.ml.tests.connect.test_parity_torch_distributor__9pk1vctu.log)
   2023-08-17T09:18:46.7002605Z Finished test(python3.9): pyspark.ml.tests.connect.test_parity_torch_distributor (252s)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42523: [SPARK-44806][CONNECT] Move internal client spark-connect-common to be able to test real in-process server with a real RPC client

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42523:
URL: https://github.com/apache/spark/pull/42523#issuecomment-1681341229

   @LuciferYang @grundprinzip @hvanhovell 
   Based on feedback, this version of move to common just moves the main src file that are needed for server testing, and the change is made smaller by keeping the tests in client.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42523: [SPARK-44806][CONNECT] Move internal client spark-connect-common to be able to test real in-process server with a real RPC client

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42523:
URL: https://github.com/apache/spark/pull/42523#issuecomment-1682274343

   Some flakiness in CI:
   https://github.com/juliuszsompolski/apache-spark/actions/runs/5888703839/job/15970448024
   ```
   [info] *** 4 TESTS FAILED ***
   [error] Failed tests:
   [error] 	org.apache.spark.deploy.yarn.YarnClusterSuite
   [error] (yarn / Test / test) sbt.TestsFailedException: Tests unsuccessful
   ```
   
   https://github.com/juliuszsompolski/apache-spark/actions/runs/5888703839/job/15970448536
   ```
   [info] *** 1 TEST FAILED ***
   [error] Failed: Total 3035, Failed 1, Errors 0, Passed 3034, Ignored 3
   [error] Failed tests:
   [error] 	org.apache.spark.sql.execution.streaming.state.StateStoreSuite
   ```
   
   https://github.com/juliuszsompolski/apache-spark/actions/runs/5888703839/job/15970448901
   ```
   *** 2 TESTS FAILED ***
   Failed: Total 9533, Failed 2, Errors 0, Passed 9531, Ignored 27
   Failed tests:
   org.apache.spark.sql.execution.python.PythonUDTFSuite
   ```
   
   https://github.com/juliuszsompolski/apache-spark/actions/runs/5888703839/job/15970476639 keeps running for 4.5h...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #42523: [SPARK-44806][CONNECT] Move internal client spark-connect-common to be able to test real in-process server with a real RPC client

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #42523:
URL: https://github.com/apache/spark/pull/42523#issuecomment-1682448975

   Merging this one. The tests are flaky.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org