You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "hvanhovell (via GitHub)" <gi...@apache.org> on 2023/10/02 04:38:03 UTC

[GitHub] [spark] hvanhovell opened a new pull request, #43195: [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client

hvanhovell opened a new pull request, #43195:
URL: https://github.com/apache/spark/pull/43195

   ### What changes were proposed in this pull request?
   This PR fixes shading for the Spark Connect Scala Client maven build. The following things are addressed:
   - Guava and protobuf are included in the shaded jars. These were missing, and were causing users to see `ClassNotFoundException`s.
   - Fixed duplicate shading of guava. We use the parent pom's location now.
   - Fixed duplicate Netty dependency (shaded and transitive). One was used for GRPC and the other was needed by Arrow. This was fixed by pulling arrow into the shaded jar.
   - Use the same package as the shading defined in the parent package.
   
   ### Why are the changes needed?
   The maven artifacts for the Spark Conect Scala Client are currently broken.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Manual tests.
   #### Step 1:  Build new shaded library and install it in local maven repository
   `build/mvn clean install -pl connector/connect/client/jvm -am -DskipTests`
   #### Step 2: Start Connect Server
   `connector/connect/bin/spark-connect`
   #### Step 3: Launch REPL using the newly created library
   This step requires [coursier](https://get-coursier.io/) to be installed.
   `cs launch --scala 2.12.17 -r m2Local com.lihaoyi:::ammonite:2.5.8 org.apache.spark::spark-connect-client-jvm:3.5.1-SNAPSHOT -M org.apache.spark.sql.application.ConnectRepl`
   #### Step 4: Run a bunch of commands:
   ```scala
   // Check version
   spark.version
   
   // Run a simple query
   spark.range(1, 10000, 1)
     .select($"id", $"id" % 5 as "group", rand(1).as("v1"), rand(2).as("v2"))
     .groupBy($"group")
     .agg(
       avg($"v1").as("v1_avg"),
       avg($"v2").as("v2_avg"))
     .show()
   
   // Run a streaming query
   import org.apache.spark.sql.execution.streaming.ProcessingTimeTrigger
   val query_name = "simple_streaming"
   val stream = spark.readStream
     .format("rate")
     .option("numPartitions", "1")
     .option("rowsPerSecond", "10")
     .load()
     .withWatermark("timestamp", "10 milliseconds")
     .groupBy(window(col("timestamp"), "10 milliseconds"))
     .count()
     .selectExpr("window.start as timestamp", "count as num_events")
     .writeStream
     .format("memory")
     .queryName(query_name)
     .trigger(ProcessingTimeTrigger.create("10 milliseconds"))
   // run for 20 seconds
   val query = stream.start()
   val start = System.currentTimeMillis()
   val end = System.currentTimeMillis() + 20 * 1000
   while (System.currentTimeMillis() < end) {
     println(s"time: ${System.currentTimeMillis() - start} ms")
     println(query.status)
     spark.sql(s"select * from ${query_name}").show()
     Thread.sleep(500)
   }
       query.stop()
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43195:
URL: https://github.com/apache/spark/pull/43195#issuecomment-1743317280

   Final question, should the results of maven shade and sbt assembly be consistent? This PR seems to only fix maven.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #43195: [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #43195:
URL: https://github.com/apache/spark/pull/43195#discussion_r1342274140


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -85,59 +95,70 @@
         <artifactId>maven-shade-plugin</artifactId>
         <configuration>
           <shadedArtifactAttached>false</shadedArtifactAttached>
+          <promoteTransitiveDependencies>true</promoteTransitiveDependencies>

Review Comment:
   This was probably broken as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #43195: [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client
URL: https://github.com/apache/spark/pull/43195


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #43195:
URL: https://github.com/apache/spark/pull/43195#issuecomment-1743414695

   @LuciferYang AFAICT the SBT is working at the moment. However we release using maven and this is very much broken at the moment. I am happy to homogenize the SBT and maven builds in a follow-up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #43195:
URL: https://github.com/apache/spark/pull/43195#issuecomment-1743414815

   Merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]

Posted by "Yikf (via GitHub)" <gi...@apache.org>.
Yikf commented on code in PR #43195:
URL: https://github.com/apache/spark/pull/43195#discussion_r1364888722


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -85,59 +95,70 @@
         <artifactId>maven-shade-plugin</artifactId>
         <configuration>
           <shadedArtifactAttached>false</shadedArtifactAttached>
+          <promoteTransitiveDependencies>true</promoteTransitiveDependencies>
           <artifactSet>
             <includes>
               <include>com.google.android:*</include>
               <include>com.google.api.grpc:*</include>
               <include>com.google.code.findbugs:*</include>
               <include>com.google.code.gson:*</include>
               <include>com.google.errorprone:*</include>
-              <include>com.google.guava:*</include>
               <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>com.google.flatbuffers:*</include>
               <include>io.grpc:*</include>
               <include>io.netty:*</include>
               <include>io.perfmark:*</include>
+              <include>org.apache.arrow:*</include>
               <include>org.codehaus.mojo:*</include>
               <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>org.apache.spark:spark-sql-api_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
               <pattern>com.google</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.com.google</shadedPattern>
+              <excludes>
+                <!-- Guava is relocated to ${spark.shade.packageName}.guava (see the parent pom.xml) -->
+                <exclude>com.google.common.**</exclude>

Review Comment:
   Hi @hvanhovell , connect uses a separate version of guava, is guava directly using the parent pom as expected?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #43195: [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #43195:
URL: https://github.com/apache/spark/pull/43195#discussion_r1342274068


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -85,59 +95,70 @@
         <artifactId>maven-shade-plugin</artifactId>
         <configuration>
           <shadedArtifactAttached>false</shadedArtifactAttached>
+          <promoteTransitiveDependencies>true</promoteTransitiveDependencies>
           <artifactSet>
             <includes>
               <include>com.google.android:*</include>
               <include>com.google.api.grpc:*</include>
               <include>com.google.code.findbugs:*</include>
               <include>com.google.code.gson:*</include>
               <include>com.google.errorprone:*</include>
-              <include>com.google.guava:*</include>

Review Comment:
   Guava in included by the parent pom.xml.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #43195:
URL: https://github.com/apache/spark/pull/43195#issuecomment-1742999432

   @LuciferYang I have updated the code examples, now they will run in Ammonite. I am looking at the Java 17 issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client [spark]

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #43195:
URL: https://github.com/apache/spark/pull/43195#issuecomment-1743144325

   @LuciferYang the Java 17 issue has been fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org