You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "hvanhovell (via GitHub)" <gi...@apache.org> on 2023/10/02 04:38:03 UTC
[GitHub] [spark] hvanhovell opened a new pull request, #43195: [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client
hvanhovell opened a new pull request, #43195:
URL: https://github.com/apache/spark/pull/43195
### What changes were proposed in this pull request?
This PR fixes shading for the Spark Connect Scala Client maven build. The following things are addressed:
- Guava and protobuf are included in the shaded jars. These were missing, and were causing users to see `ClassNotFoundException`s.
- Fixed duplicate shading of guava. We use the parent pom's location now.
- Fixed duplicate Netty dependency (shaded and transitive). One was used for GRPC and the other was needed by Arrow. This was fixed by pulling arrow into the shaded jar.
- Use the same package as the shading defined in the parent package.
### Why are the changes needed?
The maven artifacts for the Spark Conect Scala Client are currently broken.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual tests.
#### Step 1: Build new shaded library and install it in local maven repository
`build/mvn clean install -pl connector/connect/client/jvm -am -DskipTests`
#### Step 2: Start Connect Server
`connector/connect/bin/spark-connect`
#### Step 3: Launch REPL using the newly created library
This step requires [coursier](https://get-coursier.io/) to be installed.
`cs launch --scala 2.12.17 -r m2Local com.lihaoyi:::ammonite:2.5.8 org.apache.spark::spark-connect-client-jvm:3.5.1-SNAPSHOT -M org.apache.spark.sql.application.ConnectRepl`
#### Step 4: Run a bunch of commands:
```scala
// Check version
spark.version
// Run a simple query
spark.range(1, 10000, 1)
.select($"id", $"id" % 5 as "group", rand(1).as("v1"), rand(2).as("v2"))
.groupBy($"group")
.agg(
avg($"v1").as("v1_avg"),
avg($"v2").as("v2_avg"))
.show()
// Run a streaming query
import org.apache.spark.sql.execution.streaming.ProcessingTimeTrigger
val query_name = "simple_streaming"
val stream = spark.readStream
.format("rate")
.option("numPartitions", "1")
.option("rowsPerSecond", "10")
.load()
.withWatermark("timestamp", "10 milliseconds")
.groupBy(window(col("timestamp"), "10 milliseconds"))
.count()
.selectExpr("window.start as timestamp", "count as num_events")
.writeStream
.format("memory")
.queryName(query_name)
.trigger(ProcessingTimeTrigger.create("10 milliseconds"))
// run for 20 seconds
val query = stream.start()
val start = System.currentTimeMillis()
val end = System.currentTimeMillis() + 20 * 1000
while (System.currentTimeMillis() < end) {
println(s"time: ${System.currentTimeMillis() - start} ms")
println(query.status)
spark.sql(s"select * from ${query_name}").show()
Thread.sleep(500)
}
query.stop()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]
Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43195:
URL: https://github.com/apache/spark/pull/43195#issuecomment-1743317280
Final question, should the results of maven shade and sbt assembly be consistent? This PR seems to only fix maven.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] hvanhovell commented on a diff in pull request #43195: [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client
Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #43195:
URL: https://github.com/apache/spark/pull/43195#discussion_r1342274140
##########
connector/connect/client/jvm/pom.xml:
##########
@@ -85,59 +95,70 @@
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
+ <promoteTransitiveDependencies>true</promoteTransitiveDependencies>
Review Comment:
This was probably broken as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]
Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #43195: [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client
URL: https://github.com/apache/spark/pull/43195
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]
Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #43195:
URL: https://github.com/apache/spark/pull/43195#issuecomment-1743414695
@LuciferYang AFAICT the SBT is working at the moment. However we release using maven and this is very much broken at the moment. I am happy to homogenize the SBT and maven builds in a follow-up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]
Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #43195:
URL: https://github.com/apache/spark/pull/43195#issuecomment-1743414815
Merging.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]
Posted by "Yikf (via GitHub)" <gi...@apache.org>.
Yikf commented on code in PR #43195:
URL: https://github.com/apache/spark/pull/43195#discussion_r1364888722
##########
connector/connect/client/jvm/pom.xml:
##########
@@ -85,59 +95,70 @@
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
+ <promoteTransitiveDependencies>true</promoteTransitiveDependencies>
<artifactSet>
<includes>
<include>com.google.android:*</include>
<include>com.google.api.grpc:*</include>
<include>com.google.code.findbugs:*</include>
<include>com.google.code.gson:*</include>
<include>com.google.errorprone:*</include>
- <include>com.google.guava:*</include>
<include>com.google.j2objc:*</include>
<include>com.google.protobuf:*</include>
+ <include>com.google.flatbuffers:*</include>
<include>io.grpc:*</include>
<include>io.netty:*</include>
<include>io.perfmark:*</include>
+ <include>org.apache.arrow:*</include>
<include>org.codehaus.mojo:*</include>
<include>org.checkerframework:*</include>
<include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+ <include>org.apache.spark:spark-sql-api_${scala.binary.version}</include>
</includes>
</artifactSet>
<relocations>
<relocation>
<pattern>io.grpc</pattern>
- <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
+ <shadedPattern>${spark.shade.packageName}.io.grpc</shadedPattern>
<includes>
<include>io.grpc.**</include>
</includes>
</relocation>
<relocation>
<pattern>com.google</pattern>
- <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+ <shadedPattern>${spark.shade.packageName}.com.google</shadedPattern>
+ <excludes>
+ <!-- Guava is relocated to ${spark.shade.packageName}.guava (see the parent pom.xml) -->
+ <exclude>com.google.common.**</exclude>
Review Comment:
Hi @hvanhovell , connect uses a separate version of guava, is guava directly using the parent pom as expected?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] hvanhovell commented on a diff in pull request #43195: [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client
Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #43195:
URL: https://github.com/apache/spark/pull/43195#discussion_r1342274068
##########
connector/connect/client/jvm/pom.xml:
##########
@@ -85,59 +95,70 @@
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
+ <promoteTransitiveDependencies>true</promoteTransitiveDependencies>
<artifactSet>
<includes>
<include>com.google.android:*</include>
<include>com.google.api.grpc:*</include>
<include>com.google.code.findbugs:*</include>
<include>com.google.code.gson:*</include>
<include>com.google.errorprone:*</include>
- <include>com.google.guava:*</include>
Review Comment:
Guava in included by the parent pom.xml.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]
Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #43195:
URL: https://github.com/apache/spark/pull/43195#issuecomment-1742999432
@LuciferYang I have updated the code examples, now they will run in Ammonite. I am looking at the Java 17 issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]
Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #43195:
URL: https://github.com/apache/spark/pull/43195#issuecomment-1743144325
@LuciferYang the Java 17 issue has been fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org