You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "hvanhovell (via GitHub)" <gi...@apache.org> on 2023/08/21 20:50:56 UTC

[GitHub] [spark] hvanhovell opened a new pull request, #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

hvanhovell opened a new pull request, #42591:
URL: https://github.com/apache/spark/pull/42591

   ### What changes were proposed in this pull request?
   This PR makes a bunch of changes to connect testing for the scala client:
   - We do not start the connect server with the `SPARK_DIST_CLASSPATH ` environment variable. This is set by the build system, but its value for SBT and Maven is different. For SBT it also contained the client code.
   - We use dependency upload to add the dependencies needed for the tests. Currently this entails: the compiled test classes (class files), scalatest jars, and scalactic jars.
   - The use of classfile sync unearthed an issue with stubbing and the `ExecutorClassLoader`. If they load classes in the same namespace then stubbing will generate stubs for classes that can be loaded by the `ExecutorClassLoader`. Since this is mostly a testing issue I decided to move the test code to a different namespace. We should definitely fix this later on.
   - A bunch of tiny fixes.
   
   ### Why are the changes needed?
   SBT testing for connect leaked client side code into the server. This is a problem because tests pass and we sign-off on features that do not work when well in a normal environment. Stubbing was an example of this. Maven did not have this problem and was therefore more correct.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   It are mostly tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No. I write my own code thank you...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1303969113


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      tryFindJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)
+        .map(clientTestJar => clientTestJar.getCanonicalPath)
+        .get
+
+    val catalogImplementation = if (IntegrationTestUtils.isSparkHiveJarAvailable) {
+      "hive"
+    } else {
+      // scalastyle:off println
+      println(
+        "Will start Spark Connect server with `spark.sql.catalogImplementation=in-memory`, " +
+          "some tests that rely on Hive will be ignored. If you don't want to skip them:\n" +
+          "1. Test with maven: run `build/mvn install -DskipTests -Phive` before testing\n" +
+          "2. Test with sbt: run test with `-Phive` profile")
+      // scalastyle:on println
+      // SPARK-43647: Proactively cleaning the `classes` and `test-classes` dir of hive
+      // module to avoid unexpected loading of `DataSourceRegister` in hive module during
+      // testing without `-Phive` profile.
+      IntegrationTestUtils.cleanUpHiveClassesDirIfNeeded()
+      "in-memory"
+    }
+    val confs = Seq(
+      // Use InMemoryTableCatalog for V2 writer tests
+      "spark.sql.catalog.testcat=org.apache.spark.sql.connector.catalog.InMemoryTableCatalog",
+      // Try to use the hive catalog, fallback to in-memory if it is not there.
+      "spark.sql.catalogImplementation=" + catalogImplementation,
+      // Make the server terminate reattachable streams every 1 second and 123 bytes,
+      // to make the tests exercise reattach.
+      "spark.connect.execute.reattachable.senderMaxStreamDuration=1s",
+      "spark.connect.execute.reattachable.senderMaxStreamSize=123",
+      // Disable UI
+      "spark.ui.enabled=false")
+    Seq("--jars", catalystTestJar) ++ confs.flatMap(v => "--conf" :: v :: Nil)
+  }
+
+  def start(): Unit = {
+    assert(!stopped)
+    sparkConnect
+  }
+
+  def stop(): Int = {
+    stopped = true
+    debug("Stopping the Spark Connect Server...")
+    try {
+      consoleOut.write(serverStopCommand.getBytes)
+      consoleOut.flush()
+      consoleOut.close()
+      if (!sparkConnect.waitFor(2, TimeUnit.SECONDS)) {
+        sparkConnect.destroyForcibly()
+      }
+      val code = sparkConnect.exitValue()
+      debug(s"Spark Connect Server is stopped with exit code: $code")
+      code
+    } catch {
+      case e: IOException if e.getMessage.contains("Stream closed") =>
+        -1
+      case e: Throwable =>
+        debug(e)
+        sparkConnect.destroyForcibly()
+        throw e
+    }
+  }
+
+  def syncTestDependencies(spark: SparkSession): Unit = {
+    // Both SBT & Maven pass the test-classes as a directory instead of a jar.
+    val testClassesPath = Paths.get(IntegrationTestUtils.connectClientTestClassDir)
+    spark.client.artifactManager.addClassDir(testClassesPath)
+
+    // We need scalatest & scalactic on the classpath to make the tests work.
+    val jars = System
+      .getProperty("java.class.path")
+      .split(File.pathSeparatorChar)
+      .filter(_.endsWith(".jar"))
+      .filter(e => e.contains("scalatest") || e.contains("scalactic"))
+      .map(e => Paths.get(e).toUri)
+    spark.client.artifactManager.addArtifacts(jars)
+  }
+
+  def createSparkSession(): SparkSession = {
+    if (!SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_17)) {

Review Comment:
   This check is no longer needed, https://github.com/apache/spark/pull/42643 has been merged, and to avoid conflicts in the connect module, https://github.com/apache/spark/pull/42643 has also been merged into branch-3.5.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1302617281


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      tryFindJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)
+        .map(clientTestJar => clientTestJar.getCanonicalPath)
+        .get
+
+    val catalogImplementation = if (IntegrationTestUtils.isSparkHiveJarAvailable) {
+      "hive"
+    } else {
+      // scalastyle:off println
+      println(
+        "Will start Spark Connect server with `spark.sql.catalogImplementation=in-memory`, " +
+          "some tests that rely on Hive will be ignored. If you don't want to skip them:\n" +
+          "1. Test with maven: run `build/mvn install -DskipTests -Phive` before testing\n" +
+          "2. Test with sbt: run test with `-Phive` profile")
+      // scalastyle:on println
+      // SPARK-43647: Proactively cleaning the `classes` and `test-classes` dir of hive
+      // module to avoid unexpected loading of `DataSourceRegister` in hive module during
+      // testing without `-Phive` profile.
+      IntegrationTestUtils.cleanUpHiveClassesDirIfNeeded()
+      "in-memory"
+    }
+    val confs = Seq(
+      // Use InMemoryTableCatalog for V2 writer tests
+      "spark.sql.catalog.testcat=org.apache.spark.sql.connector.catalog.InMemoryTableCatalog",
+      // Try to use the hive catalog, fallback to in-memory if it is not there.
+      "spark.sql.catalogImplementation=" + catalogImplementation,
+      // Make the server terminate reattachable streams every 1 second and 123 bytes,
+      // to make the tests exercise reattach.
+      "spark.connect.execute.reattachable.senderMaxStreamDuration=1s",
+      "spark.connect.execute.reattachable.senderMaxStreamSize=123",
+      // Disable UI
+      "spark.ui.enabled=false")
+    Seq("--jars", catalystTestJar) ++ confs.flatMap(v => "--conf" :: v :: Nil)
+  }
+
+  def start(): Unit = {
+    assert(!stopped)
+    sparkConnect
+  }
+
+  def stop(): Int = {
+    stopped = true
+    debug("Stopping the Spark Connect Server...")
+    try {
+      consoleOut.write(serverStopCommand.getBytes)
+      consoleOut.flush()
+      consoleOut.close()
+      if (!sparkConnect.waitFor(2, TimeUnit.SECONDS)) {
+        sparkConnect.destroyForcibly()
+      }
+      val code = sparkConnect.exitValue()
+      debug(s"Spark Connect Server is stopped with exit code: $code")
+      code
+    } catch {
+      case e: IOException if e.getMessage.contains("Stream closed") =>
+        -1
+      case e: Throwable =>
+        debug(e)
+        sparkConnect.destroyForcibly()
+        throw e
+    }
+  }
+
+  def syncTestDependencies(spark: SparkSession): Unit = {
+    // Both SBT & Maven pass the test-classes as a directory instead of a jar.
+    val testClassesPath = Paths.get(IntegrationTestUtils.connectClientTestClassDir)
+    spark.client.artifactManager.addClassDir(testClassesPath)
+
+    // We need scalatest & scalactic on the classpath to make the tests work.
+    val jars = System
+      .getProperty("java.class.path")
+      .split(File.pathSeparatorChar)
+      .filter(_.endsWith(".jar"))
+      .filter(e => e.contains("scalatest") || e.contains("scalactic"))
+      .map(e => Paths.get(e).toUri)
+    spark.client.artifactManager.addArtifacts(jars)
+  }
+
+  def createSparkSession(): SparkSession = {
+    if (!SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_17)) {

Review Comment:
   This should be `if (!SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_16))` here, otherwise `SparkConnectServer` would not start up with Java 17 either ...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1303969113


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      tryFindJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)
+        .map(clientTestJar => clientTestJar.getCanonicalPath)
+        .get
+
+    val catalogImplementation = if (IntegrationTestUtils.isSparkHiveJarAvailable) {
+      "hive"
+    } else {
+      // scalastyle:off println
+      println(
+        "Will start Spark Connect server with `spark.sql.catalogImplementation=in-memory`, " +
+          "some tests that rely on Hive will be ignored. If you don't want to skip them:\n" +
+          "1. Test with maven: run `build/mvn install -DskipTests -Phive` before testing\n" +
+          "2. Test with sbt: run test with `-Phive` profile")
+      // scalastyle:on println
+      // SPARK-43647: Proactively cleaning the `classes` and `test-classes` dir of hive
+      // module to avoid unexpected loading of `DataSourceRegister` in hive module during
+      // testing without `-Phive` profile.
+      IntegrationTestUtils.cleanUpHiveClassesDirIfNeeded()
+      "in-memory"
+    }
+    val confs = Seq(
+      // Use InMemoryTableCatalog for V2 writer tests
+      "spark.sql.catalog.testcat=org.apache.spark.sql.connector.catalog.InMemoryTableCatalog",
+      // Try to use the hive catalog, fallback to in-memory if it is not there.
+      "spark.sql.catalogImplementation=" + catalogImplementation,
+      // Make the server terminate reattachable streams every 1 second and 123 bytes,
+      // to make the tests exercise reattach.
+      "spark.connect.execute.reattachable.senderMaxStreamDuration=1s",
+      "spark.connect.execute.reattachable.senderMaxStreamSize=123",
+      // Disable UI
+      "spark.ui.enabled=false")
+    Seq("--jars", catalystTestJar) ++ confs.flatMap(v => "--conf" :: v :: Nil)
+  }
+
+  def start(): Unit = {
+    assert(!stopped)
+    sparkConnect
+  }
+
+  def stop(): Int = {
+    stopped = true
+    debug("Stopping the Spark Connect Server...")
+    try {
+      consoleOut.write(serverStopCommand.getBytes)
+      consoleOut.flush()
+      consoleOut.close()
+      if (!sparkConnect.waitFor(2, TimeUnit.SECONDS)) {
+        sparkConnect.destroyForcibly()
+      }
+      val code = sparkConnect.exitValue()
+      debug(s"Spark Connect Server is stopped with exit code: $code")
+      code
+    } catch {
+      case e: IOException if e.getMessage.contains("Stream closed") =>
+        -1
+      case e: Throwable =>
+        debug(e)
+        sparkConnect.destroyForcibly()
+        throw e
+    }
+  }
+
+  def syncTestDependencies(spark: SparkSession): Unit = {
+    // Both SBT & Maven pass the test-classes as a directory instead of a jar.
+    val testClassesPath = Paths.get(IntegrationTestUtils.connectClientTestClassDir)
+    spark.client.artifactManager.addClassDir(testClassesPath)
+
+    // We need scalatest & scalactic on the classpath to make the tests work.
+    val jars = System
+      .getProperty("java.class.path")
+      .split(File.pathSeparatorChar)
+      .filter(_.endsWith(".jar"))
+      .filter(e => e.contains("scalatest") || e.contains("scalactic"))
+      .map(e => Paths.get(e).toUri)
+    spark.client.artifactManager.addArtifacts(jars)
+  }
+
+  def createSparkSession(): SparkSession = {
+    if (!SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_17)) {

Review Comment:
   The check of `if (!SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_17)) ` is no longer needed, https://github.com/apache/spark/pull/42643 has been merged, and to avoid conflicts in the connect module, https://github.com/apache/spark/pull/42643 has also been merged into branch-3.5.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1305775674


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/StubbingTestSuite.scala:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.RemoteSparkSession
+
+class StubbingTestSuite extends RemoteSparkSession {
+  private def eval[T](f: => T): T = f
+
+  test("capture of to-be stubbed class") {

Review Comment:
   I revamped the test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1305820478


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/StubbingTestSuite.scala:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.RemoteSparkSession
+
+class StubbingTestSuite extends RemoteSparkSession {
+  private def eval[T](f: => T): T = f
+
+  test("capture of to-be stubbed class") {

Review Comment:
   I can confirm it is working with Maven, and that it triggers stubbing on the server side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1300848645


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      tryFindJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)
+        .map(clientTestJar => clientTestJar.getCanonicalPath)
+        .get
+
+    val catalogImplementation = if (IntegrationTestUtils.isSparkHiveJarAvailable) {
+      "hive"
+    } else {
+      // scalastyle:off println
+      println(
+        "Will start Spark Connect server with `spark.sql.catalogImplementation=in-memory`, " +
+          "some tests that rely on Hive will be ignored. If you don't want to skip them:\n" +
+          "1. Test with maven: run `build/mvn install -DskipTests -Phive` before testing\n" +
+          "2. Test with sbt: run test with `-Phive` profile")
+      // scalastyle:on println
+      // SPARK-43647: Proactively cleaning the `classes` and `test-classes` dir of hive
+      // module to avoid unexpected loading of `DataSourceRegister` in hive module during
+      // testing without `-Phive` profile.
+      IntegrationTestUtils.cleanUpHiveClassesDirIfNeeded()
+      "in-memory"
+    }
+    val confs = Seq(
+      // Use InMemoryTableCatalog for V2 writer tests
+      "spark.sql.catalog.testcat=org.apache.spark.sql.connector.catalog.InMemoryTableCatalog",
+      // Try to use the hive catalog, fallback to in-memory if it is not there.
+      "spark.sql.catalogImplementation=" + catalogImplementation,
+      // Make the server terminate reattachable streams every 1 second and 123 bytes,
+      // to make the tests exercise reattach.
+      "spark.connect.execute.reattachable.senderMaxStreamDuration=1s",
+      "spark.connect.execute.reattachable.senderMaxStreamSize=123",
+      // Disable UI
+      "spark.ui.enabled=false")
+    Seq("--jars", catalystTestJar) ++ confs.flatMap(v => "--conf" :: v :: Nil)
+  }
+
+  def start(): Unit = {
+    assert(!stopped)
+    sparkConnect
+  }
+
+  def stop(): Int = {
+    stopped = true
+    debug("Stopping the Spark Connect Server...")
+    try {
+      consoleOut.write(serverStopCommand.getBytes)
+      consoleOut.flush()
+      consoleOut.close()
+      if (!sparkConnect.waitFor(2, TimeUnit.SECONDS)) {
+        sparkConnect.destroyForcibly()
+      }
+      val code = sparkConnect.exitValue()
+      debug(s"Spark Connect Server is stopped with exit code: $code")
+      code
+    } catch {
+      case e: IOException if e.getMessage.contains("Stream closed") =>
+        -1
+      case e: Throwable =>
+        debug(e)
+        sparkConnect.destroyForcibly()
+        throw e
+    }
+  }
+
+  def syncTestDependencies(spark: SparkSession): Unit = {
+    // Both SBT & Maven pass the test-classes as a directory instead of a jar.
+    val testClassesPath = Paths.get(IntegrationTestUtils.connectClientTestClassDir)
+    spark.client.artifactManager.addClassDir(testClassesPath)
+
+    // We need scalatest & scalactic on the classpath to make the tests work.
+    val jars = System
+      .getProperty("java.class.path")
+      .split(File.pathSeparatorChar)
+      .filter(_.endsWith(".jar"))
+      .filter(e => e.contains("scalatest") || e.contains("scalactic"))
+      .map(e => Paths.get(e).toUri)
+    spark.client.artifactManager.addArtifacts(jars)
+  }
+
+  def createSparkSession(): SparkSession = {
+    if (!SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_17)) {
+      return null
+    }
+    SparkConnectServerUtils.start()
+
+    val spark = SparkSession
+      .builder()
+      .client(
+        SparkConnectClient
+          .builder()
+          .userId("test")
+          .port(port)
+          .retryPolicy(RetryPolicy(maxRetries = 7, maxBackoff = FiniteDuration(10, "s")))
+          .build())
+      .create()
+
+    // Execute an RPC which will get retried until the server is up.

Review Comment:
   Retries do not seem to work well for artifact upload, so I am falling back to this one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1300848063


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar

Review Comment:
   This is needed to make the test work with maven. Assembly might produce slightly different results.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1300636608


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala:
##########
@@ -162,15 +162,37 @@ class SparkConnectArtifactManager(sessionHolder: SessionHolder) extends Logging
    */
   def classloader: ClassLoader = {
     val urls = getSparkConnectAddedJars :+ classDir.toUri.toURL
-    val loader = if (SparkEnv.get.conf.get(CONNECT_SCALA_UDF_STUB_PREFIXES).nonEmpty) {
-      val stubClassLoader =
-        StubClassLoader(null, SparkEnv.get.conf.get(CONNECT_SCALA_UDF_STUB_PREFIXES))
-      new ChildFirstURLClassLoader(
-        urls.toArray,
-        stubClassLoader,
-        Utils.getContextOrSparkClassLoader)
+    val prefixes = SparkEnv.get.conf.get(CONNECT_SCALA_UDF_STUB_PREFIXES)
+    val userClasspathFirst = SparkEnv.get.conf.get(EXECUTOR_USER_CLASS_PATH_FIRST)
+    val loader = if (prefixes.nonEmpty) {

Review Comment:
   For the record, this fixes nothing. It just make sure we respect the `userClasspathFirst` setting. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #42591:
URL: https://github.com/apache/spark/pull/42591#issuecomment-1693516807

   @LuciferYang PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1303969113


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      tryFindJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)
+        .map(clientTestJar => clientTestJar.getCanonicalPath)
+        .get
+
+    val catalogImplementation = if (IntegrationTestUtils.isSparkHiveJarAvailable) {
+      "hive"
+    } else {
+      // scalastyle:off println
+      println(
+        "Will start Spark Connect server with `spark.sql.catalogImplementation=in-memory`, " +
+          "some tests that rely on Hive will be ignored. If you don't want to skip them:\n" +
+          "1. Test with maven: run `build/mvn install -DskipTests -Phive` before testing\n" +
+          "2. Test with sbt: run test with `-Phive` profile")
+      // scalastyle:on println
+      // SPARK-43647: Proactively cleaning the `classes` and `test-classes` dir of hive
+      // module to avoid unexpected loading of `DataSourceRegister` in hive module during
+      // testing without `-Phive` profile.
+      IntegrationTestUtils.cleanUpHiveClassesDirIfNeeded()
+      "in-memory"
+    }
+    val confs = Seq(
+      // Use InMemoryTableCatalog for V2 writer tests
+      "spark.sql.catalog.testcat=org.apache.spark.sql.connector.catalog.InMemoryTableCatalog",
+      // Try to use the hive catalog, fallback to in-memory if it is not there.
+      "spark.sql.catalogImplementation=" + catalogImplementation,
+      // Make the server terminate reattachable streams every 1 second and 123 bytes,
+      // to make the tests exercise reattach.
+      "spark.connect.execute.reattachable.senderMaxStreamDuration=1s",
+      "spark.connect.execute.reattachable.senderMaxStreamSize=123",
+      // Disable UI
+      "spark.ui.enabled=false")
+    Seq("--jars", catalystTestJar) ++ confs.flatMap(v => "--conf" :: v :: Nil)
+  }
+
+  def start(): Unit = {
+    assert(!stopped)
+    sparkConnect
+  }
+
+  def stop(): Int = {
+    stopped = true
+    debug("Stopping the Spark Connect Server...")
+    try {
+      consoleOut.write(serverStopCommand.getBytes)
+      consoleOut.flush()
+      consoleOut.close()
+      if (!sparkConnect.waitFor(2, TimeUnit.SECONDS)) {
+        sparkConnect.destroyForcibly()
+      }
+      val code = sparkConnect.exitValue()
+      debug(s"Spark Connect Server is stopped with exit code: $code")
+      code
+    } catch {
+      case e: IOException if e.getMessage.contains("Stream closed") =>
+        -1
+      case e: Throwable =>
+        debug(e)
+        sparkConnect.destroyForcibly()
+        throw e
+    }
+  }
+
+  def syncTestDependencies(spark: SparkSession): Unit = {
+    // Both SBT & Maven pass the test-classes as a directory instead of a jar.
+    val testClassesPath = Paths.get(IntegrationTestUtils.connectClientTestClassDir)
+    spark.client.artifactManager.addClassDir(testClassesPath)
+
+    // We need scalatest & scalactic on the classpath to make the tests work.
+    val jars = System
+      .getProperty("java.class.path")
+      .split(File.pathSeparatorChar)
+      .filter(_.endsWith(".jar"))
+      .filter(e => e.contains("scalatest") || e.contains("scalactic"))
+      .map(e => Paths.get(e).toUri)
+    spark.client.artifactManager.addArtifacts(jars)
+  }
+
+  def createSparkSession(): SparkSession = {
+    if (!SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_17)) {

Review Comment:
   This check is no longer needed, https://github.com/apache/spark/pull/42643 has been merged in, and to avoid conflicts in the connect module, https://github.com/apache/spark/pull/42643 has also been merged into branch-3.5.



##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      tryFindJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)
+        .map(clientTestJar => clientTestJar.getCanonicalPath)
+        .get
+
+    val catalogImplementation = if (IntegrationTestUtils.isSparkHiveJarAvailable) {
+      "hive"
+    } else {
+      // scalastyle:off println
+      println(
+        "Will start Spark Connect server with `spark.sql.catalogImplementation=in-memory`, " +
+          "some tests that rely on Hive will be ignored. If you don't want to skip them:\n" +
+          "1. Test with maven: run `build/mvn install -DskipTests -Phive` before testing\n" +
+          "2. Test with sbt: run test with `-Phive` profile")
+      // scalastyle:on println
+      // SPARK-43647: Proactively cleaning the `classes` and `test-classes` dir of hive
+      // module to avoid unexpected loading of `DataSourceRegister` in hive module during
+      // testing without `-Phive` profile.
+      IntegrationTestUtils.cleanUpHiveClassesDirIfNeeded()
+      "in-memory"
+    }
+    val confs = Seq(
+      // Use InMemoryTableCatalog for V2 writer tests
+      "spark.sql.catalog.testcat=org.apache.spark.sql.connector.catalog.InMemoryTableCatalog",
+      // Try to use the hive catalog, fallback to in-memory if it is not there.
+      "spark.sql.catalogImplementation=" + catalogImplementation,
+      // Make the server terminate reattachable streams every 1 second and 123 bytes,
+      // to make the tests exercise reattach.
+      "spark.connect.execute.reattachable.senderMaxStreamDuration=1s",
+      "spark.connect.execute.reattachable.senderMaxStreamSize=123",
+      // Disable UI
+      "spark.ui.enabled=false")
+    Seq("--jars", catalystTestJar) ++ confs.flatMap(v => "--conf" :: v :: Nil)
+  }
+
+  def start(): Unit = {
+    assert(!stopped)
+    sparkConnect
+  }
+
+  def stop(): Int = {
+    stopped = true
+    debug("Stopping the Spark Connect Server...")
+    try {
+      consoleOut.write(serverStopCommand.getBytes)
+      consoleOut.flush()
+      consoleOut.close()
+      if (!sparkConnect.waitFor(2, TimeUnit.SECONDS)) {
+        sparkConnect.destroyForcibly()
+      }
+      val code = sparkConnect.exitValue()
+      debug(s"Spark Connect Server is stopped with exit code: $code")
+      code
+    } catch {
+      case e: IOException if e.getMessage.contains("Stream closed") =>
+        -1
+      case e: Throwable =>
+        debug(e)
+        sparkConnect.destroyForcibly()
+        throw e
+    }
+  }
+
+  def syncTestDependencies(spark: SparkSession): Unit = {
+    // Both SBT & Maven pass the test-classes as a directory instead of a jar.
+    val testClassesPath = Paths.get(IntegrationTestUtils.connectClientTestClassDir)
+    spark.client.artifactManager.addClassDir(testClassesPath)
+
+    // We need scalatest & scalactic on the classpath to make the tests work.
+    val jars = System
+      .getProperty("java.class.path")
+      .split(File.pathSeparatorChar)
+      .filter(_.endsWith(".jar"))
+      .filter(e => e.contains("scalatest") || e.contains("scalactic"))
+      .map(e => Paths.get(e).toUri)
+    spark.client.artifactManager.addArtifacts(jars)
+  }
+
+  def createSparkSession(): SparkSession = {
+    if (!SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_17)) {

Review Comment:
   ~This should be `if (!SystemUtils.isJavaVersionAtMost(JavaVersion.JAVA_16))` here, otherwise `SparkConnectServer` would not start up with Java 17 either ...~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1302661412


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/StubbingTestSuite.scala:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.RemoteSparkSession
+
+class StubbingTestSuite extends RemoteSparkSession {
+  private def eval[T](f: => T): T = f
+
+  test("capture of to-be stubbed class") {

Review Comment:
   I manually revert the changes from SPARK-44806 based on this PR and verified that the test cases in `SparkSessionE2ESuite` can pass through Maven. However, it would be best if we could find a new target case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #42591:
URL: https://github.com/apache/spark/pull/42591#issuecomment-1689510381

   A Java code style issue needs to be fixed.
   ```
   src/test/java/org/apache/spark/sql/JavaEncoderSuite.java:[31,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.connect.client.SparkConnectClient.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1300634916


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/UserDefinedFunctionE2ETestSuite.scala:
##########
@@ -215,33 +215,31 @@ class UserDefinedFunctionE2ETestSuite extends QueryTest {
   }
 
   test("Dataset foreachPartition") {
-    val sum = new AtomicLong()
     val func: Iterator[JLong] => Unit = f => {
+      val sum = new AtomicLong()
       f.foreach(v => sum.addAndGet(v))
-      // The value should be 45
-      assert(sum.get() == -1)
+      throw new Exception("Success, processed records: " + sum.get())

Review Comment:
   This is a fun one. This was throwing a scalatest specific exception. The problem with this is that this error is decoded by the ResultHandler in Spark Core, this does not have the thread so things fail with a `ClassNotFoundException` instead of the expected exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1300638356


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/StubbingTestSuite.scala:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.RemoteSparkSession
+
+class StubbingTestSuite extends RemoteSparkSession {
+  private def eval[T](f: => T): T = f
+
+  test("capture of to-be stubbed class") {

Review Comment:
   This was a working reproduction for issue. Now that `SparkResult` has moved to common, I guess we need to find a new problem...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #42591:
URL: https://github.com/apache/spark/pull/42591#issuecomment-1694559597

   GA passed. Merged into master and 3.5. Thanks @hvanhovell 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang closed pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang closed pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.
URL: https://github.com/apache/spark/pull/42591


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1301251492


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/StubbingTestSuite.scala:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.RemoteSparkSession
+
+class StubbingTestSuite extends RemoteSparkSession {
+  private def eval[T](f: => T): T = f
+
+  test("capture of to-be stubbed class") {

Review Comment:
   Yeah, this test case can't reproduce the problem. The new test requires a class like `SparkResult`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1301251492


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/StubbingTestSuite.scala:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.RemoteSparkSession
+
+class StubbingTestSuite extends RemoteSparkSession {
+  private def eval[T](f: => T): T = f
+
+  test("capture of to-be stubbed class") {

Review Comment:
   Yeah, this test case can no longer reproduce the problem. The new test requires a class that only exists in the `connect-client-jvm` module and it needs to be in `src/main` instead of `src/test`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1301251492


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/StubbingTestSuite.scala:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql
+
+import org.apache.spark.sql.test.RemoteSparkSession
+
+class StubbingTestSuite extends RemoteSparkSession {
+  private def eval[T](f: => T): T = f
+
+  test("capture of to-be stubbed class") {

Review Comment:
   Yeah, the this test case can no longer reproduce the problem. The new test requires a class that only exists in the `connect-client-jvm` module and it needs to be in `src/main` instead of `src/test`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1301418438


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      tryFindJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)

Review Comment:
   if it is certain that `catalystTestJar` will definitely exist(Calling `.get` will not throw an exception), can it be changed to use
   
   ```
   findJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)
     .getCanonicalPath
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1302509829


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      tryFindJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)

Review Comment:
   Yes, it will build.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1305880806


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      findJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true).getCanonicalPath
+
+    val catalogImplementation = if (IntegrationTestUtils.isSparkHiveJarAvailable) {
+      "hive"
+    } else {
+      // scalastyle:off println
+      println(
+        "Will start Spark Connect server with `spark.sql.catalogImplementation=in-memory`, " +
+          "some tests that rely on Hive will be ignored. If you don't want to skip them:\n" +
+          "1. Test with maven: run `build/mvn install -DskipTests -Phive` before testing\n" +
+          "2. Test with sbt: run test with `-Phive` profile")
+      // scalastyle:on println
+      // SPARK-43647: Proactively cleaning the `classes` and `test-classes` dir of hive
+      // module to avoid unexpected loading of `DataSourceRegister` in hive module during
+      // testing without `-Phive` profile.
+      IntegrationTestUtils.cleanUpHiveClassesDirIfNeeded()
+      "in-memory"
+    }
+    val confs = Seq(
+      // Use InMemoryTableCatalog for V2 writer tests
+      "spark.sql.catalog.testcat=org.apache.spark.sql.connector.catalog.InMemoryTableCatalog",
+      // Try to use the hive catalog, fallback to in-memory if it is not there.
+      "spark.sql.catalogImplementation=" + catalogImplementation,
+      // Make the server terminate reattachable streams every 1 second and 123 bytes,
+      // to make the tests exercise reattach.
+      "spark.connect.execute.reattachable.senderMaxStreamDuration=1s",
+      "spark.connect.execute.reattachable.senderMaxStreamSize=123",
+      // Disable UI
+      "spark.ui.enabled=false")
+    Seq("--jars", catalystTestJar) ++ confs.flatMap(v => "--conf" :: v :: Nil)
+  }
+
+  def start(): Unit = {
+    assert(!stopped)
+    sparkConnect
+  }
+
+  def stop(): Int = {
+    stopped = true
+    debug("Stopping the Spark Connect Server...")
+    try {
+      consoleOut.write(serverStopCommand.getBytes)
+      consoleOut.flush()
+      consoleOut.close()
+      if (!sparkConnect.waitFor(2, TimeUnit.SECONDS)) {
+        sparkConnect.destroyForcibly()
+      }
+      val code = sparkConnect.exitValue()
+      debug(s"Spark Connect Server is stopped with exit code: $code")
+      code
+    } catch {
+      case e: IOException if e.getMessage.contains("Stream closed") =>
+        -1
+      case e: Throwable =>
+        debug(e)
+        sparkConnect.destroyForcibly()
+        throw e
+    }
+  }
+
+  def syncTestDependencies(spark: SparkSession): Unit = {
+    // Both SBT & Maven pass the test-classes as a directory instead of a jar.
+    val testClassesPath = Paths.get(IntegrationTestUtils.connectClientTestClassDir)
+    spark.client.artifactManager.addClassDir(testClassesPath)
+
+    // We need scalatest & scalactic on the classpath to make the tests work.
+    val jars = System
+      .getProperty("java.class.path")
+      .split(File.pathSeparatorChar)
+      .filter(_.endsWith(".jar"))
+      .filter(e => e.contains("scalatest") || e.contains("scalactic"))

Review Comment:
   
   
   Nit:
   ```
   /Users/yangjie01/.m2/repository/org/scalatestplus/scalacheck-1-17_2.12/3.2.16.0/scalacheck-1-17_2.12-3.2.16.0.jar
   /Users/yangjie01/.m2/repository/org/scalatestplus/mockito-4-11_2.12/3.2.16.0/mockito-4-11_2.12-3.2.16.0.jar
   /Users/yangjie01/.m2/repository/org/scalatestplus/selenium-4-9_2.12/3.2.16.0/selenium-4-9_2.12-3.2.16.0.jar
   ```
   
   `jars` also include `scalacheck`, `mockito` and `selenium` due to the full path includes `scalatestplus`, is this expected?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1306004731


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,221 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      findJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true).getCanonicalPath
+
+    val catalogImplementation = if (IntegrationTestUtils.isSparkHiveJarAvailable) {
+      "hive"
+    } else {
+      // scalastyle:off println
+      println(
+        "Will start Spark Connect server with `spark.sql.catalogImplementation=in-memory`, " +
+          "some tests that rely on Hive will be ignored. If you don't want to skip them:\n" +
+          "1. Test with maven: run `build/mvn install -DskipTests -Phive` before testing\n" +
+          "2. Test with sbt: run test with `-Phive` profile")
+      // scalastyle:on println
+      // SPARK-43647: Proactively cleaning the `classes` and `test-classes` dir of hive
+      // module to avoid unexpected loading of `DataSourceRegister` in hive module during
+      // testing without `-Phive` profile.
+      IntegrationTestUtils.cleanUpHiveClassesDirIfNeeded()
+      "in-memory"
+    }
+    val confs = Seq(
+      // Use InMemoryTableCatalog for V2 writer tests
+      "spark.sql.catalog.testcat=org.apache.spark.sql.connector.catalog.InMemoryTableCatalog",
+      // Try to use the hive catalog, fallback to in-memory if it is not there.
+      "spark.sql.catalogImplementation=" + catalogImplementation,
+      // Make the server terminate reattachable streams every 1 second and 123 bytes,
+      // to make the tests exercise reattach.
+      "spark.connect.execute.reattachable.senderMaxStreamDuration=1s",
+      "spark.connect.execute.reattachable.senderMaxStreamSize=123",
+      // Disable UI
+      "spark.ui.enabled=false")
+    Seq("--jars", catalystTestJar) ++ confs.flatMap(v => "--conf" :: v :: Nil)
+  }
+
+  def start(): Unit = {
+    assert(!stopped)
+    sparkConnect
+  }
+
+  def stop(): Int = {
+    stopped = true
+    debug("Stopping the Spark Connect Server...")
+    try {
+      consoleOut.write(serverStopCommand.getBytes)
+      consoleOut.flush()
+      consoleOut.close()
+      if (!sparkConnect.waitFor(2, TimeUnit.SECONDS)) {
+        sparkConnect.destroyForcibly()
+      }
+      val code = sparkConnect.exitValue()
+      debug(s"Spark Connect Server is stopped with exit code: $code")
+      code
+    } catch {
+      case e: IOException if e.getMessage.contains("Stream closed") =>
+        -1
+      case e: Throwable =>
+        debug(e)
+        sparkConnect.destroyForcibly()
+        throw e
+    }
+  }
+
+  def syncTestDependencies(spark: SparkSession): Unit = {
+    // Both SBT & Maven pass the test-classes as a directory instead of a jar.
+    val testClassesPath = Paths.get(IntegrationTestUtils.connectClientTestClassDir)
+    spark.client.artifactManager.addClassDir(testClassesPath)
+
+    // We need scalatest & scalactic on the classpath to make the tests work.
+    val jars = System
+      .getProperty("java.class.path")
+      .split(File.pathSeparatorChar)
+      .filter(_.endsWith(".jar"))
+      .filter(e => e.contains("scalatest") || e.contains("scalactic"))

Review Comment:
   LOL - I can make it more restrictive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1301844060


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      tryFindJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)

Review Comment:
   Well you tell me, does maven build the test artifact when you call `mvn clean install -DskipTests`? If it does then we should be good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1301418438


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.test
+
+import java.io.{File, IOException, OutputStream}
+import java.lang.ProcessBuilder
+import java.lang.ProcessBuilder.Redirect
+import java.nio.file.Paths
+import java.util.concurrent.TimeUnit
+
+import scala.concurrent.duration.FiniteDuration
+
+import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.scalactic.source.Position
+import org.scalatest.{BeforeAndAfterAll, Tag}
+
+import org.apache.spark.SparkBuildInfo
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryPolicy
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.test.IntegrationTestUtils._
+
+/**
+ * An util class to start a local spark connect server in a different process for local E2E tests.
+ * Pre-running the tests, the spark connect artifact needs to be built using e.g. `build/sbt
+ * package`. It is designed to start the server once but shared by all tests. It is equivalent to
+ * use the following command to start the connect server via command line:
+ *
+ * {{{
+ * bin/spark-shell \
+ * --jars `ls connector/connect/server/target/**/spark-connect*SNAPSHOT.jar | paste -sd ',' -` \
+ * --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin
+ * }}}
+ *
+ * Set system property `spark.test.home` or env variable `SPARK_HOME` if the test is not executed
+ * from the Spark project top folder. Set system property `spark.debug.sc.jvm.client=true` or
+ * environment variable `SPARK_DEBUG_SC_JVM_CLIENT=true` to print the server process output in the
+ * console to debug server start stop problems.
+ */
+object SparkConnectServerUtils {
+
+  // Server port
+  val port: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var stopped = false
+
+  private var consoleOut: OutputStream = _
+  private val serverStopCommand = "q"
+
+  private lazy val sparkConnect: java.lang.Process = {
+    debug("Starting the Spark Connect Server...")
+    val connectJar = findJar(
+      "connector/connect/server",
+      "spark-connect-assembly",
+      "spark-connect").getCanonicalPath
+
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--driver-class-path" += connectJar
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")
+    if (isDebug) {
+      builder.redirectError(Redirect.INHERIT)
+      builder.redirectOutput(Redirect.INHERIT)
+    }
+
+    val process = builder.start()
+    consoleOut = process.getOutputStream
+
+    // Adding JVM shutdown hook
+    sys.addShutdownHook(stop())
+    process
+  }
+
+  /**
+   * As one shared spark will be started for all E2E tests, for tests that needs some special
+   * configs, we add them here
+   */
+  private def testConfigs: Seq[String] = {
+    // To find InMemoryTableCatalog for V2 writer tests
+    val catalystTestJar =
+      tryFindJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)

Review Comment:
   if it is certain that `catalystTestJar` will definitely exist, can it be changed to use
   
   ```
   findJar("sql/catalyst", "spark-catalyst", "spark-catalyst", test = true)
     .getCanonicalPath
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1300637450


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala:
##########
@@ -55,33 +56,34 @@ object SparkConnectServerUtils {
 
   @volatile private var stopped = false
 
-  private var consoleOut: BufferedOutputStream = _
+  private var consoleOut: OutputStream = _
   private val serverStopCommand = "q"
 
-  private lazy val sparkConnect: Process = {
+  private lazy val sparkConnect: java.lang.Process = {
     debug("Starting the Spark Connect Server...")
     val connectJar = findJar(
       "connector/connect/server",
       "spark-connect-assembly",
       "spark-connect").getCanonicalPath
 
-    val builder = Process(
-      Seq(
-        "bin/spark-submit",
-        "--driver-class-path",
-        connectJar,
-        "--conf",
-        s"spark.connect.grpc.binding.port=$port") ++ testConfigs ++ debugConfigs ++ Seq(
-        "--class",
-        "org.apache.spark.sql.connect.SimpleSparkConnectService",
-        connectJar),
-      new File(sparkHome))
-
-    val io = new ProcessIO(
-      in => consoleOut = new BufferedOutputStream(in),
-      out => Source.fromInputStream(out).getLines.foreach(debug),
-      err => Source.fromInputStream(err).getLines.foreach(debug))
-    val process = builder.run(io)
+    val command = Seq.newBuilder[String]
+    command += "bin/spark-submit"
+    command += "--class" += "org.apache.spark.sql.connect.SimpleSparkConnectService"
+    command += "--conf" += s"spark.connect.grpc.binding.port=$port"
+    command ++= testConfigs
+    command ++= debugConfigs
+    command += connectJar
+    val builder = new ProcessBuilder(command.result(): _*)
+    builder.directory(new File(sparkHome))
+    val environment = builder.environment()
+    environment.remove("SPARK_DIST_CLASSPATH")

Review Comment:
   This is the important bit, here we nuke the `SPARK_DIST_CLASSPATH` env variable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42591: [SPARK-44784][CONNECT] Make SBT testing hermetic.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42591:
URL: https://github.com/apache/spark/pull/42591#discussion_r1300636608


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala:
##########
@@ -162,15 +162,37 @@ class SparkConnectArtifactManager(sessionHolder: SessionHolder) extends Logging
    */
   def classloader: ClassLoader = {
     val urls = getSparkConnectAddedJars :+ classDir.toUri.toURL
-    val loader = if (SparkEnv.get.conf.get(CONNECT_SCALA_UDF_STUB_PREFIXES).nonEmpty) {
-      val stubClassLoader =
-        StubClassLoader(null, SparkEnv.get.conf.get(CONNECT_SCALA_UDF_STUB_PREFIXES))
-      new ChildFirstURLClassLoader(
-        urls.toArray,
-        stubClassLoader,
-        Utils.getContextOrSparkClassLoader)
+    val prefixes = SparkEnv.get.conf.get(CONNECT_SCALA_UDF_STUB_PREFIXES)
+    val userClasspathFirst = SparkEnv.get.conf.get(EXECUTOR_USER_CLASS_PATH_FIRST)
+    val loader = if (prefixes.nonEmpty) {

Review Comment:
   For the record, this fixes nothing. It just makes sure we respect the `userClasspathFirst` setting. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org