You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhenlineo (via GitHub)" <gi...@apache.org> on 2023/03/04 01:06:06 UTC

[GitHub] [spark] zhenlineo opened a new pull request, #40274: [SPARK-42215][CONNECT] Single command to run Scala Client IT tests

zhenlineo opened a new pull request, #40274:
URL: https://github.com/apache/spark/pull/40274

   ### What changes were proposed in this pull request?
   Make use of the new spark-connect script to make the Scala client test to not directly depends on any other modules.
   The dependency is still there but hidden by the spark-connect script. When calling the script to start the server, the script performs a `build/sbt package` to ensure all the jars are build and can be found in the correct path. 
   
   After the change, we can use the following commands to run the Scala client tests:
   ```
   build/mvn clean
   build/mvn compile -pl connector/connect/client/jvm
   build/mvn test -pl connector/connect/client/jvm
   ```
   
   ```
   build/sbt clean
   build/sbt "testOnly org.apache.spark.sql.ClientE2ETestSuite" 
   ```
   
   ```
   build/sbt clean
   build/sbt "connect-client-jvm/test"
   ```
   
   **Scala 2.13**
   ```
   build/mvn clean
   build/mvn -Pscala-2.13 compile -pl connector/connect/client/jvm -am -DskipTests
   build/mvn -Pscala-2.13 test -pl connector/connect/client/jvm
   
   // These commands failed with errors to find catalyst ArrowWriter. The error seems unrelated to this change.
   ```
   
   ```
   build/sbt clean
   build/sbt "testOnly org.apache.spark.sql.ClientE2ETestSuite" -Pscala-2.13
   ```
   
   ```
   build/sbt clean
   build/sbt "connect-client-jvm/test" -Pscala-2.13
   ```
   
   After the change, the waiting time to run the E2ESuite is ~3min for a clean build. Then ~1min for subsequent runs. The test is slower only because we moved the build time from many commands to this single command. There is no limitations to add more tests as the delay is only caused by the shared server start time. Once the server is started, the tests run fast.
   
   ### Why are the changes needed?
   A single command for maven users to mvn clean install to run all tests and build.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Build, Manual tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40274:
URL: https://github.com/apache/spark/pull/40274#discussion_r1125214866


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala:
##########
@@ -76,7 +76,8 @@ class ClientE2ETestSuite extends RemoteSparkSession {
     assert(result(2) == 2)
   }
 
-  test("simple udf") {
+  // Ignore this test until the udf is fully implemented.
+  ignore("simple udf") {

Review Comment:
   I created SPARK-42665 yesterday because it still test fail in 3.4.0 RC2 and be reported in the dev mail list



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40274:
URL: https://github.com/apache/spark/pull/40274#discussion_r1126381570


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala:
##########
@@ -76,7 +76,8 @@ class ClientE2ETestSuite extends RemoteSparkSession {
     assert(result(2) == 2)
   }
 
-  test("simple udf") {
+  // Ignore this test until the udf is fully implemented.
+  ignore("simple udf") {

Review Comment:
   cc @hvanhovell should we ignore this test first? Maven test must fail due to the function is `org.apache.spark.sql.ClientE2ETestSuite$$Lambda$XXX`,  but there is no `ClientE2ETestSuite` in server side when test using maven.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1455012036

   In the pr description, `build/mvn compile -pl connector/connect/client/jvm` should be `build/mvn compile -pl connector/connect/client/jvm -am` ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1455094955

   On the whole, it is good for me. There is only one question. Spark still uses maven for version release and deploy. But after this pr, the E2E test change to use sbt assembly server jar instead of maven shaded server jar for testing, which may weaken the maven test. We may need other ways to ensure the correctness of maven shaded server jar.
   
   In the future, we may use sbt to completely replace maven(should not be in Spark 3.4.0), including version release, deploy and other help tools, which will no longer be a problem at that time.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1456552298

   https://github.com/apache/spark/pull/40304 https://github.com/apache/spark/pull/40303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Single command to run Scala Client IT tests

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1454314742

   The full error (even with the clean master branch):
   ```
   build/mvn clean
   build/mvn -Pscala-2.13 compile -pl connector/connect/client/jvm -am -DskipTests
   build/mvn -Pscala-2.13 test -pl connector/connect/client/jvm -> errored here
   
   [ERROR] [Error] /Users/zhen.li/code/spark-sbt/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/util/ConvertToArrow.scala:28: object arrow is not a member of package org.apache.spark.sql.execution
   [ERROR] [Error] /Users/zhen.li/code/spark-sbt/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/util/ConvertToArrow.scala:46: not found: type ArrowWriter
   [ERROR] [Error] /Users/zhen.li/code/spark-sbt/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/util/ConvertToArrow.scala:46: not found: value ArrowWriter
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1463235387

   > seems `SimpleSparkConnectService` startup failed, the error message is
   > 
   > ```
   > Error: Missing application resource.
   > 
   > Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
   > Usage: spark-submit --kill [submission ID] --master [spark://...]
   > Usage: spark-submit --status [submission ID] --master [spark://...]
   > Usage: spark-submit run-example [options] example-class [example args]
   > 
   > Options:
   >   --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
   >                               k8s://https://host:port, or local (Default: local[*]).
   >   --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
   >                               on one of the worker machines inside the cluster ("cluster")
   >                               (Default: client).
   >   --class CLASS_NAME          Your application's main class (for Java / Scala apps).
   >   --name NAME                 A name of your application.
   >   --jars JARS                 Comma-separated list of jars to include on the driver
   > ...
   > ```
   
   Yeah, this was caused by the bug we had in the scripts.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Single command to run Scala Client IT tests

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1454313007

   @hvanhovell 
   cc @LuciferYang


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1454336041

   Thanks for your work @zhenlineo 
   If you don't mind, please give me more time to think about this pr :)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1455105130

   There is another problem that needs to be confirmed, which may not related to current pr: if other Suites inherit `RemoteSparkSession`, they will share the same connect server, right? (`SparkConnectServerUtils` is an object, so `SparkConnect` will only submit once)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo closed pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo closed pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests
URL: https://github.com/apache/spark/pull/40274


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40274:
URL: https://github.com/apache/spark/pull/40274#discussion_r1125212429


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala:
##########
@@ -76,7 +76,8 @@ class ClientE2ETestSuite extends RemoteSparkSession {
     assert(result(2) == 2)
   }
 
-  test("simple udf") {
+  // Ignore this test until the udf is fully implemented.
+  ignore("simple udf") {

Review Comment:
   Would you mind ignore this one in a in an independent pr first?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1456456654

   @LuciferYang Thanks for your review. This PR was trying to simplify the test running steps. But as you said it make the maven commands to call sbt implicitly. I will split the changes into smaller PRs to allow this PR only deal with the IT command change. Then we can votes if we like this change or not :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40274:
URL: https://github.com/apache/spark/pull/40274#discussion_r1126381570


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala:
##########
@@ -76,7 +76,8 @@ class ClientE2ETestSuite extends RemoteSparkSession {
     assert(result(2) == 2)
   }
 
-  test("simple udf") {
+  // Ignore this test until the udf is fully implemented.
+  ignore("simple udf") {

Review Comment:
   cc @hvanhovell should we ignore this test first? Maven test must fail due to the function is `org.apache.spark.sql.ClientE2ETestSuite$$Lambda$XXX`,  but there is no `org.apache.spark.sql.ClientE2ETestSuite` in server side when test using maven.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1461210301

   seems `SimpleSparkConnectService` startup failed, the error message is 
   
   ```
   Error: Missing application resource.
   
   Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
   Usage: spark-submit --kill [submission ID] --master [spark://...]
   Usage: spark-submit --status [submission ID] --master [spark://...]
   Usage: spark-submit run-example [options] example-class [example args]
   
   Options:
     --master MASTER_URL         spark://host:port, mesos://host:port, yarn,
                                 k8s://https://host:port, or local (Default: local[*]).
     --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                                 on one of the worker machines inside the cluster ("cluster")
                                 (Default: client).
     --class CLASS_NAME          Your application's main class (for Java / Scala apps).
     --name NAME                 A name of your application.
     --jars JARS                 Comma-separated list of jars to include on the driver
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on PR #40274:
URL: https://github.com/apache/spark/pull/40274#issuecomment-1463236560

   @hvanhovell Want to keep this or shall we skip? It helps a bit when not knowing `build/sbt -Pconnect -Phive package` before running the IT. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org