You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by hv...@apache.org on 2023/03/02 18:36:40 UTC
[spark] branch master updated: [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`
This is an automated email from the ASF dual-hosted git repository.
hvanhovell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 12beadbc052 [SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`
12beadbc052 is described below
commit 12beadbc0525b473575f83d01c962f7238f9069b
Author: yangjie01 <ya...@baidu.com>
AuthorDate: Thu Mar 2 14:36:17 2023 -0400
[SPARK-42599][CONNECT][INFRA] Introduce `dev/connect-jvm-client-mima-check` instead of `CompatibilitySuite`
### What changes were proposed in this pull request?
The main changes of this pr as follows:
- Refactor `CompatibilitySuite` as a new tool `CheckConnectJvmClientCompatibility`
- Add a new shell script `dev/connect-jvm-client-mima-check`, it will use `CheckConnectJvmClientCompatibility` to check the mima compatibility of `connect-jvm-client` module.
- Add `dev/connect-jvm-client-mima-check` to github task
### Why are the changes needed?
For fix test error report in `[VOTE] Release Apache Spark 3.4.0 (RC1)` mail list.
Testing `CompatibilitySuite` with maven requires some pre-work:
```
build/mvn clean install -DskipTests -pl sql/core -am
build/mvn clean install -DskipTests -pl connector/connect/client/jvm -am
```
So if we run `build/mvn package test` to test whole project as before, `CompatibilitySuite` will failed as follows:
```
CompatibilitySuite:
- compatibility MiMa tests *** FAILED ***
java.lang.AssertionError: assertion failed: Failed to find the jar inside folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67)
at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
at org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$1(CompatibilitySuite.scala:69)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
...
- compatibility API tests: Dataset *** FAILED ***
java.lang.AssertionError: assertion failed: Failed to find the jar inside folder: /home/bjorn/spark-3.4.0/connector/connect/client/jvm/target
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:67)
at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
at org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$7(CompatibilitySuite.scala:110)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
```
So we need to fix this problem for developers.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GitHub Actions
Closes #40213 from LuciferYang/connect-client-compatibility-backup.
Authored-by: yangjie01 <ya...@baidu.com>
Signed-off-by: Herman van Hovell <he...@databricks.com>
---
.github/workflows/build_and_test.yml | 3 +
...la => CheckConnectJvmClientCompatibility.scala} | 136 +++++++++++++--------
dev/connect-jvm-client-mima-check | 72 +++++++++++
3 files changed, 163 insertions(+), 48 deletions(-)
diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml
index 586dff8f093..7ae04a9bd80 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -564,6 +564,9 @@ jobs:
run: ./dev/lint-scala
- name: Java linter
run: ./dev/lint-java
+ - name: Spark connect jvm client mima check
+ if: inputs.branch != 'branch-3.2' && inputs.branch != 'branch-3.3'
+ run: ./dev/connect-jvm-client-mima-check
- name: Install Python linter dependencies
run: |
# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes.
diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
similarity index 70%
rename from connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala
rename to connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
index a4cb2a13e5b..4f4ca9ad990 100644
--- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala
+++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
@@ -16,49 +16,83 @@
*/
package org.apache.spark.sql.connect.client
-import java.io.File
+import java.io.{File, Writer}
import java.net.URLClassLoader
+import java.nio.charset.StandardCharsets
+import java.nio.file.{Files, Paths}
import java.util.regex.Pattern
-import com.typesafe.tools.mima.core._
+import scala.reflect.runtime.universe.runtimeMirror
+
+import com.typesafe.tools.mima.core.{Problem, ProblemFilter, ProblemFilters}
import com.typesafe.tools.mima.lib.MiMaLib
-import org.apache.spark.sql.connect.client.util.ConnectFunSuite
import org.apache.spark.sql.connect.client.util.IntegrationTestUtils._
+import org.apache.spark.util.ChildFirstURLClassLoader
/**
- * This test checks the binary compatibility of the connect client API against the spark SQL API
- * using MiMa. We did not write this check using a SBT build rule as the rule cannot provide the
- * same level of freedom as a test. With a test we can:
+ * A tool for checking the binary compatibility of the connect client API against the spark SQL
+ * API using MiMa. We did not write this check using a SBT build rule as the rule cannot provide
+ * the same level of freedom as a test. With a test we can:
* 1. Specify any two jars to run the compatibility check.
* 1. Easily make the test automatically pick up all new methods added while the client is being
* built.
*
- * The test requires the following artifacts built before running:
- * {{{
- * spark-sql
- * spark-connect-client-jvm
- * }}}
- * To build the above artifact, use e.g. `build/sbt package` or `build/mvn clean install
- * -DskipTests`.
- *
- * When debugging this test, if any changes to the client API, the client jar need to be built
- * before running the test. An example workflow with SBT for this test:
- * 1. Compatibility test has reported an unexpected client API change.
- * 1. Fix the wrong client API.
- * 1. Build the client jar: `build/sbt package`
- * 1. Run the test again: `build/sbt "testOnly
- * org.apache.spark.sql.connect.client.CompatibilitySuite"`
+ * We can run this check by executing the `dev/connect-jvm-client-mima-check`.
*/
-class CompatibilitySuite extends ConnectFunSuite {
+// scalastyle:off println
+object CheckConnectJvmClientCompatibility {
- private lazy val clientJar: File =
- findJar(
- "connector/connect/client/jvm",
- "spark-connect-client-jvm-assembly",
- "spark-connect-client-jvm")
+ private lazy val sparkHome: String = {
+ if (!sys.env.contains("SPARK_HOME")) {
+ throw new IllegalArgumentException("SPARK_HOME is not set.")
+ }
+ sys.env("SPARK_HOME")
+ }
- private lazy val sqlJar: File = findJar("sql/core", "spark-sql", "spark-sql")
+ def main(args: Array[String]): Unit = {
+ var resultWriter: Writer = null
+ try {
+ resultWriter = Files.newBufferedWriter(
+ Paths.get(s"$sparkHome/.connect-mima-check-result"),
+ StandardCharsets.UTF_8)
+ val clientJar: File =
+ findJar(
+ "connector/connect/client/jvm",
+ "spark-connect-client-jvm-assembly",
+ "spark-connect-client-jvm")
+ val sqlJar: File = findJar("sql/core", "spark-sql", "spark-sql")
+ val problems = checkMiMaCompatibility(clientJar, sqlJar)
+ if (problems.nonEmpty) {
+ resultWriter.write(s"ERROR: Comparing client jar: $clientJar and and sql jar: $sqlJar \n")
+ resultWriter.write(s"problems: \n")
+ resultWriter.write(s"${problems.map(p => p.description("client")).mkString("\n")}")
+ resultWriter.write("\n")
+ resultWriter.write(
+ "Exceptions to binary compatibility can be added in " +
+ "'CheckConnectJvmClientCompatibility#checkMiMaCompatibility'\n")
+ }
+ val incompatibleApis = checkDatasetApiCompatibility(clientJar, sqlJar)
+ if (incompatibleApis.nonEmpty) {
+ resultWriter.write(
+ "ERROR: The Dataset apis only exist in the connect client " +
+ "module and not belong to the sql module include: \n")
+ resultWriter.write(incompatibleApis.mkString("\n"))
+ resultWriter.write("\n")
+ resultWriter.write(
+ "Exceptions can be added to exceptionMethods in " +
+ "'CheckConnectJvmClientCompatibility#checkDatasetApiCompatibility'\n")
+ }
+ } catch {
+ case e: Throwable =>
+ println(e.getMessage)
+ resultWriter.write(s"ERROR: ${e.getMessage}")
+ } finally {
+ if (resultWriter != null) {
+ resultWriter.close()
+ }
+ }
+ }
/**
* MiMa takes an old jar (sql jar) and a new jar (client jar) as inputs and then reports all
@@ -67,7 +101,7 @@ class CompatibilitySuite extends ConnectFunSuite {
* need to be checked. Then exclude rules are applied to filter out all unsupported methods in
* the client classes.
*/
- test("compatibility MiMa tests") {
+ private def checkMiMaCompatibility(clientJar: File, sqlJar: File): List[Problem] = {
val mima = new MiMaLib(Seq(clientJar, sqlJar))
val allProblems = mima.collectProblems(sqlJar, clientJar, List.empty)
val includedRules = Seq(
@@ -184,30 +218,36 @@ class CompatibilitySuite extends ConnectFunSuite {
.filter { p =>
excludeRules.forall(rule => rule(p))
}
-
- if (problems.nonEmpty) {
- fail(
- s"\nComparing client jar: $clientJar\nand sql jar: $sqlJar\n" +
- problems.map(p => p.description("client")).mkString("\n"))
- }
+ problems
}
- test("compatibility API tests: Dataset") {
- val clientClassLoader: URLClassLoader = new URLClassLoader(Seq(clientJar.toURI.toURL).toArray)
- val sqlClassLoader: URLClassLoader = new URLClassLoader(Seq(sqlJar.toURI.toURL).toArray)
+ private def checkDatasetApiCompatibility(clientJar: File, sqlJar: File): Seq[String] = {
- val clientClass = clientClassLoader.loadClass("org.apache.spark.sql.Dataset")
- val sqlClass = sqlClassLoader.loadClass("org.apache.spark.sql.Dataset")
+ def methods(jar: File, className: String): Seq[String] = {
+ val classLoader: URLClassLoader =
+ new ChildFirstURLClassLoader(Seq(jar.toURI.toURL).toArray, this.getClass.getClassLoader)
+ val mirror = runtimeMirror(classLoader)
+ // scalastyle:off classforname
+ val classSymbol =
+ mirror.classSymbol(Class.forName(className, false, classLoader))
+ // scalastyle:on classforname
+ classSymbol.typeSignature.members
+ .filter(_.isMethod)
+ .map(_.asMethod)
+ .filter(m => m.isPublic)
+ .map(_.fullName)
+ .toSeq
+ }
- val newMethods = clientClass.getMethods
- val oldMethods = sqlClass.getMethods
+ val className = "org.apache.spark.sql.Dataset"
+ val clientMethods = methods(clientJar, className)
+ val sqlMethods = methods(sqlJar, className)
+ // Exclude some public methods that must be added through `exceptionMethods`
+ val exceptionMethods =
+ Seq("org.apache.spark.sql.Dataset.collectResult", "org.apache.spark.sql.Dataset.plan")
- // For now we simply check the new methods is a subset of the old methods.
- newMethods
- .map(m => m.toString)
- .foreach(method => {
- assert(oldMethods.map(m => m.toString).contains(method))
- })
+ // Find new public functions that are not in sql module `Dataset`.
+ clientMethods.diff(sqlMethods).diff(exceptionMethods)
}
private case class IncludeByName(name: String) extends ProblemFilter {
diff --git a/dev/connect-jvm-client-mima-check b/dev/connect-jvm-client-mima-check
new file mode 100755
index 00000000000..2dbbdaf8764
--- /dev/null
+++ b/dev/connect-jvm-client-mima-check
@@ -0,0 +1,72 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+set -o pipefail
+set -e
+
+# Go to the Spark project root directory
+FWDIR="$(cd "`dirname "$0"`"/..; pwd)"
+cd "$FWDIR"
+export SPARK_HOME=$FWDIR
+echo $SPARK_HOME
+
+if [[ -x "$JAVA_HOME/bin/java" ]]; then
+ JAVA_CMD="$JAVA_HOME/bin/java"
+else
+ JAVA_CMD=java
+fi
+
+rm -f .connect-mima-check-result
+
+echo "Build sql module, connect-client-jvm module and connect-client-jvm test jar..."
+build/sbt "sql/package;connect-client-jvm/assembly;connect-client-jvm/Test/package"
+
+CONNECT_TEST_CLASSPATH="$(build/sbt -DcopyDependencies=false "export connect-client-jvm/Test/fullClasspath" | grep jar | tail -n1)"
+CONNECT_CLASSPATH="$(build/sbt -DcopyDependencies=false "export connect-client-jvm/fullClasspath" | grep jar | tail -n1)"
+SQL_CLASSPATH="$(build/sbt -DcopyDependencies=false "export sql/fullClasspath" | grep jar | tail -n1)"
+
+
+echo "Do connect-client-jvm module mima check ..."
+
+$JAVA_CMD \
+ -Xmx2g \
+ -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.util.jar=ALL-UNNAMED \
+ -cp "$CONNECT_CLASSPATH:$CONNECT_TEST_CLASSPATH:$SQL_CLASSPATH" \
+ org.apache.spark.sql.connect.client.CheckConnectJvmClientCompatibility
+
+echo "finish connect-client-jvm module mima check ..."
+
+RESULT_SIZE=$(wc -l .connect-mima-check-result | awk '{print $1}')
+
+# The the file has no content if check passed.
+if [[ $RESULT_SIZE -eq "0" ]]; then
+ ERRORS=""
+else
+ ERRORS=$(grep ERROR .connect-mima-check-result | tail -n1)
+fi
+
+if test ! -z "$ERRORS"; then
+ cat .connect-mima-check-result
+ echo -e "connect-client-jvm module mima check failed."
+ rm .connect-mima-check-result
+ exit 1
+else
+ rm .connect-mima-check-result
+ echo -e "sql and connect-client-jvm module mima check passed."
+fi
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org