You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhenlineo (via GitHub)" <gi...@apache.org> on 2023/03/04 01:06:06 UTC
[GitHub] [spark] zhenlineo opened a new pull request, #40274: [SPARK-42215][CONNECT] Single command to run Scala Client IT tests
zhenlineo opened a new pull request, #40274:
URL: https://github.com/apache/spark/pull/40274
### What changes were proposed in this pull request?
Make use of the new spark-connect script to make the Scala client test to not directly depends on any other modules.
The dependency is still there but hidden by the spark-connect script. When calling the script to start the server, the script performs a `build/sbt package` to ensure all the jars are build and can be found in the correct path.
After the change, we can use the following commands to run the Scala client tests:
```
build/mvn clean
build/mvn compile -pl connector/connect/client/jvm
build/mvn test -pl connector/connect/client/jvm
```
```
build/sbt clean
build/sbt "testOnly org.apache.spark.sql.ClientE2ETestSuite"
```
```
build/sbt clean
build/sbt "connect-client-jvm/test"
```
**Scala 2.13**
```
build/mvn clean
build/mvn -Pscala-2.13 compile -pl connector/connect/client/jvm -am -DskipTests
build/mvn -Pscala-2.13 test -pl connector/connect/client/jvm
// These commands failed with errors to find catalyst ArrowWriter. The error seems unrelated to this change.
```
```
build/sbt clean
build/sbt "testOnly org.apache.spark.sql.ClientE2ETestSuite" -Pscala-2.13
```
```
build/sbt clean
build/sbt "connect-client-jvm/test" -Pscala-2.13
```
After the change, the waiting time to run the E2ESuite is ~3min for a clean build. Then ~1min for subsequent runs. The test is slower only because we moved the build time from many commands to this single command. There is no limitations to add more tests as the delay is only caused by the shared server start time. Once the server is started, the tests run fast.
### Why are the changes needed?
A single command for maven users to mvn clean install to run all tests and build.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Build, Manual tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40274:
URL: https://github.com/apache/spark/pull/40274#discussion_r1125214866
##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala:
##########
@@ -76,7 +76,8 @@ class ClientE2ETestSuite extends RemoteSparkSession {
assert(result(2) == 2)
}
- test("simple udf") {
+ // Ignore this test until the udf is fully implemented.
+ ignore("simple udf") {
Review Comment:
I created SPARK-42665 yesterday because it still test fail in 3.4.0 RC2 and be reported in the dev mail list
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40274:
URL: https://github.com/apache/spark/pull/40274#discussion_r1126381570
##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala:
##########
@@ -76,7 +76,8 @@ class ClientE2ETestSuite extends RemoteSparkSession {
assert(result(2) == 2)
}
- test("simple udf") {
+ // Ignore this test until the udf is fully implemented.
+ ignore("simple udf") {
Review Comment:
cc @hvanhovell should we ignore this test first? Maven test must fail due to the function is `org.apache.spark.sql.ClientE2ETestSuite$$Lambda$XXX`, but there is no `ClientE2ETestSuite` in server side when test using maven.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1455012036
In the pr description, `build/mvn compile -pl connector/connect/client/jvm` should be `build/mvn compile -pl connector/connect/client/jvm -am` ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1455094955
On the whole, it is good for me. There is only one question. Spark still uses maven for version release and deploy. But after this pr, the E2E test change to use sbt assembly server jar instead of maven shaded server jar for testing, which may weaken the maven test. We may need other ways to ensure the correctness of maven shaded server jar.
In the future, we may use sbt to completely replace maven(should not be in Spark 3.4.0), including version release, deploy and other help tools, which will no longer be a problem at that time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1456552298
https://github.com/apache/spark/pull/40304 https://github.com/apache/spark/pull/40303
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Single command to run Scala Client IT tests
Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1454314742
The full error (even with the clean master branch):
```
build/mvn clean
build/mvn -Pscala-2.13 compile -pl connector/connect/client/jvm -am -DskipTests
build/mvn -Pscala-2.13 test -pl connector/connect/client/jvm -> errored here
[ERROR] [Error] /Users/zhen.li/code/spark-sbt/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/util/ConvertToArrow.scala:28: object arrow is not a member of package org.apache.spark.sql.execution
[ERROR] [Error] /Users/zhen.li/code/spark-sbt/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/util/ConvertToArrow.scala:46: not found: type ArrowWriter
[ERROR] [Error] /Users/zhen.li/code/spark-sbt/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/util/ConvertToArrow.scala:46: not found: value ArrowWriter
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1463235387
> seems `SimpleSparkConnectService` startup failed, the error message is
>
> ```
> Error: Missing application resource.
>
> Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
> Usage: spark-submit --kill [submission ID] --master [spark://...]
> Usage: spark-submit --status [submission ID] --master [spark://...]
> Usage: spark-submit run-example [options] example-class [example args]
>
> Options:
> --master MASTER_URL spark://host:port, mesos://host:port, yarn,
> k8s://https://host:port, or local (Default: local[*]).
> --deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
> on one of the worker machines inside the cluster ("cluster")
> (Default: client).
> --class CLASS_NAME Your application's main class (for Java / Scala apps).
> --name NAME A name of your application.
> --jars JARS Comma-separated list of jars to include on the driver
> ...
> ```
Yeah, this was caused by the bug we had in the scripts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Single command to run Scala Client IT tests
Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1454313007
@hvanhovell
cc @LuciferYang
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1454336041
Thanks for your work @zhenlineo
If you don't mind, please give me more time to think about this pr :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1455105130
There is another problem that needs to be confirmed, which may not related to current pr: if other Suites inherit `RemoteSparkSession`, they will share the same connect server, right? (`SparkConnectServerUtils` is an object, so `SparkConnect` will only submit once)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhenlineo closed pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo closed pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
URL: https://github.com/apache/spark/pull/40274
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40274:
URL: https://github.com/apache/spark/pull/40274#discussion_r1125212429
##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala:
##########
@@ -76,7 +76,8 @@ class ClientE2ETestSuite extends RemoteSparkSession {
assert(result(2) == 2)
}
- test("simple udf") {
+ // Ignore this test until the udf is fully implemented.
+ ignore("simple udf") {
Review Comment:
Would you mind ignore this one in a in an independent pr first?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1456456654
@LuciferYang Thanks for your review. This PR was trying to simplify the test running steps. But as you said it make the maven commands to call sbt implicitly. I will split the changes into smaller PRs to allow this PR only deal with the IT command change. Then we can votes if we like this change or not :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40274:
URL: https://github.com/apache/spark/pull/40274#discussion_r1126381570
##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala:
##########
@@ -76,7 +76,8 @@ class ClientE2ETestSuite extends RemoteSparkSession {
assert(result(2) == 2)
}
- test("simple udf") {
+ // Ignore this test until the udf is fully implemented.
+ ignore("simple udf") {
Review Comment:
cc @hvanhovell should we ignore this test first? Maven test must fail due to the function is `org.apache.spark.sql.ClientE2ETestSuite$$Lambda$XXX`, but there is no `org.apache.spark.sql.ClientE2ETestSuite` in server side when test using maven.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1461210301
seems `SimpleSparkConnectService` startup failed, the error message is
```
Error: Missing application resource.
Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of jars to include on the driver
...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1463236560
@hvanhovell Want to keep this or shall we skip? It helps a bit when not knowing `build/sbt -Pconnect -Phive package` before running the IT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org