You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "juliuszsompolski (via GitHub)" <gi...@apache.org> on 2023/08/15 09:31:33 UTC

[GitHub] [spark] juliuszsompolski opened a new pull request, #42465: [CONNECT][POC] Have real server and real simple client in tests - classpath order hack

juliuszsompolski opened a new pull request, #42465:
URL: https://github.com/apache/spark/pull/42465

   ### What changes were proposed in this pull request?
   
   
   ### Why are the changes needed?
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski closed pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - classpath order hack

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski closed pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - classpath order hack
URL: https://github.com/apache/spark/pull/42465


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - classpath order hack

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42465:
URL: https://github.com/apache/spark/pull/42465#issuecomment-1677123581

   @LuciferYang so the client build implicitly depends on the Server being build first, so that the server can be started by RemoteSparkSession? I think that's a bigger hack than the CLASSPATH order in https://github.com/apache/spark/pull/42465#discussion_r1292329247 :-).
   If the client doesn't depend on the server right now, isn't that order fragile right now already?
   
   But it should be possible so that the order becomes: client-internal, server, client...
   Can the server be made something like a "build-only" dependency (i.e. a dependency that is build before, but not packaged together or added to classpath) of the client to force that order?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - classpath order hack

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #42465:
URL: https://github.com/apache/spark/pull/42465#issuecomment-1677111533

   On the other hand, when build with Maven, this change will change the build order of the `client` module and the `server` module
   
   Before
   
   ```
   [INFO] Spark Project Connect Server ....................... SUCCESS [  0.001 s]
   [INFO] Spark Project Connect Client ....................... SUCCESS [  0.001 s]
   ```
   
   After
   
   ```
   [INFO] Spark Project Connect Client ....................... SUCCESS [  0.066 s]
   [INFO] Spark Project Connect Server ....................... SUCCESS [  0.049 s]
   ```
   
   However, in the client module, test cases that inherit `RemoteSparkSession` depend on the output of the server module. If the build order is changed, more refactoring work is needed.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on a diff in pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - connect-client-jvm-shaded

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on code in PR #42465:
URL: https://github.com/apache/spark/pull/42465#discussion_r1291914360


##########
connector/connect/server/pom.xml:
##########
@@ -140,6 +140,21 @@
         </exclusion>
       </exclusions>
     </dependency>
+    <dependency>
+      <!--
+      We only need classes from org.apache.spark.sql.connect for testing.
+      spark-connect-client-jvm also contain the client versions of org.apache.spark.sql classes
+      that are also present in spark-sql,
+      e.g. org.apache.spark.sql.SparkSession, org.apache.spark.sql.Dataset.
+      However, since Maven 2.0.9, Maven uses the order of dependencies in the POM to build the
+      classpath, so this dependency is below spark-sql dependency is above, the classes from the
+      server should take precedence.
+      -->
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-connect-client-jvm_${scala.binary.version}</artifactId>
+      <version>${project.version}</version>
+      <scope>test</scope>
+    </dependency>

Review Comment:
   @LuciferYang @HyukjinKwon @hvanhovell @vicennial @grundprinzip
   Could it just actually work? Seems to work for me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - connect-client-jvm-shaded

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42465:
URL: https://github.com/apache/spark/pull/42465#discussion_r1292329247


##########
connector/connect/server/pom.xml:
##########
@@ -140,6 +140,21 @@
         </exclusion>
       </exclusions>
     </dependency>
+    <dependency>
+      <!--
+      We only need classes from org.apache.spark.sql.connect for testing.
+      spark-connect-client-jvm also contain the client versions of org.apache.spark.sql classes
+      that are also present in spark-sql,
+      e.g. org.apache.spark.sql.SparkSession, org.apache.spark.sql.Dataset.
+      However, since Maven 2.0.9, Maven uses the order of dependencies in the POM to build the
+      classpath, so this dependency is below spark-sql dependency is above, the classes from the
+      server should take precedence.
+      -->
+      <groupId>org.apache.spark</groupId>
+      <artifactId>spark-connect-client-jvm_${scala.binary.version}</artifactId>
+      <version>${project.version}</version>
+      <scope>test</scope>
+    </dependency>

Review Comment:
   No, that is not a good practice. While there may be some ways to temporarily solve the classpath issue, it is not conducive to sustainable maintenance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - connect-client-jvm-shaded

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42465:
URL: https://github.com/apache/spark/pull/42465#issuecomment-1675572707

   It looks like maybe it's making the incremental build confused?
   Even if I don't change anything I get:
   ```
   [success] Total time: 176 s (02:56), completed Aug 12, 2023 12:35:16 AM
   sbt:spark-connect> testOnly *SparkConnectReattachableExecuteSuite
   [info] compiling 13 Scala sources to /home/julek/spark-apache/sql/api/target/scala-2.12/classes ...
   [info] compiling 39 Scala sources to /home/julek/spark-apache/core/target/scala-2.12/classes ...
   [info] compiling 1 Scala source to /home/julek/spark-apache/connector/connect/common/target/scala-2.12/classes ...
   [info] compiling 5 Scala sources to /home/julek/spark-apache/connector/connect/client/jvm/target/scala-2.12/classes ...
   [warn] multiple main classes detected: run 'show discoveredMainClasses' to see the list
   [info] compiling 8 Scala sources to /home/julek/spark-apache/streaming/target/scala-2.12/classes ...
   [info] compiling 32 Scala sources to /home/julek/spark-apache/core/target/scala-2.12/test-classes ...
   [info] compiling 41 Scala sources to /home/julek/spark-apache/sql/catalyst/target/scala-2.12/classes ...
   [warn] multiple main classes detected: run 'show discoveredMainClasses' to see the list
   [info] compiling 25 Scala sources to /home/julek/spark-apache/sql/core/target/scala-2.12/classes ...
   [info] compiling 1 Scala source to /home/julek/spark-apache/connector/avro/target/scala-2.12/classes ...
   [info] compiling 1 Scala source to /home/julek/spark-apache/mllib/target/scala-2.12/classes ...
   [warn] multiple main classes detected: run 'show discoveredMainClasses' to see the list
   [info] compiling 2 Scala sources to /home/julek/spark-apache/connector/connect/server/target/scala-2.12/classes ...
   [warn] multiple main classes detected: run 'show discoveredMainClasses' to see the list
   [info] compiling 56 Scala sources to /home/julek/spark-apache/sql/catalyst/target/scala-2.12/test-classes ...
   [info] compiling 227 Scala sources to /home/julek/spark-apache/sql/core/target/scala-2.12/test-classes ...
   [info] compiling 3 Scala sources to /home/julek/spark-apache/connector/connect/server/target/scala-2.12/test-classes ...
   [info] SparkConnectReattachableExecuteSuite:
   ```
   with 2-3 minutes of recompiling.
   Maybe the conflicting artifacts from spark-sql and spark-connect-client-jvm mix in a way that makes the build system confused that there were modifications?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - classpath order hack

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #42465:
URL: https://github.com/apache/spark/pull/42465#issuecomment-1680003768

   > @LuciferYang is there anything that guarantees the
   > 
   > ```
   > [INFO] Spark Project Connect Server ....................... SUCCESS [  0.001 s]
   > [INFO] Spark Project Connect Client ....................... SUCCESS [  0.001 s]
   > ```
   > 
   > order now, or is it just a happy accident?
   
   Sorry, I forgot to reply to you, there are roughly two ways to ensure this build order:
   1. If the `pom.xml` of `Connect Client ` explicitly specifies a dependency on `Connect Server `, then `Connect Server` will be compiled before `Connect Client`.
   2. If there is no compile-time dependency between `Connect Client ` and `Connect Server `, then the build order will be determined by the order of the modules in the `<modules>` section. The current build order is determined by the configuration in the Spark parent `pom.xml` as follows:
   
   https://github.com/apache/spark/blob/2be20e54a2222f6cdf64e8486d1910133b43665f/pom.xml#L105-L108
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski commented on pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - classpath order hack

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #42465:
URL: https://github.com/apache/spark/pull/42465#issuecomment-1678667380

   @LuciferYang is there anything that guarantees the
   ```
   [INFO] Spark Project Connect Server ....................... SUCCESS [  0.001 s]
   [INFO] Spark Project Connect Client ....................... SUCCESS [  0.001 s]
   ```
   order now, or is it just a happy accident?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] juliuszsompolski closed pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - classpath order hack

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski closed pull request #42465: [CONNECT][POC] Have real server and real simple client in tests - classpath order hack
URL: https://github.com/apache/spark/pull/42465


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org