You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhenlineo (via GitHub)" <gi...@apache.org> on 2023/02/03 02:48:05 UTC

[GitHub] [spark] zhenlineo opened a new pull request, #39866: [WIP] Fix the client dependency jars

zhenlineo opened a new pull request, #39866:
URL: https://github.com/apache/spark/pull/39866

   The common jar should not be shaded. The shading should be done in client or server. 
   The common jar shall depends on minimal dependency if possible. 
   The client should provides all dependencies out of the box, including netty etc. 
   The client sbt and mvn shall produce the same shading result. 
   To avoid dependency conflicts when deploying the client and the server together. The client moves all dependencies under "org.apache.spark.connect.client".
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1098878514


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   https://github.com/grpc/grpc-java/blob/61f19d707a4c6fd4f5e327b8fceb09658b9cb452/netty/shaded/build.gradle#L139-L176



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096910584


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
+              <include>com.google.j2objc:*</include>
+              <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.objenesis:*</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.netty</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.checkerframework</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.checkerframework</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>javax.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.perfmark</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.codehaus</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>android.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern>
             </relocation>
+            <!-- Relocate the protos from spark-common into a client only package -->
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.connect.proto</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.proto</shadedPattern>
               <includes>
-                <include>com.google.common.**</include>
+                <include>org.apache.spark.connect.proto.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.thirdparty</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.sql.connect.common</pattern>

Review Comment:
   `grpc`? `connect-common`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096022999


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
+              <include>com.google.j2objc:*</include>
+              <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.objenesis:*</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.netty</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.checkerframework</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.checkerframework</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>javax.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.perfmark</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.codehaus</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>android.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern>
             </relocation>
+            <!-- Relocate the protos from spark-common into a client only package -->
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.connect.proto</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.proto</shadedPattern>
               <includes>
-                <include>com.google.common.**</include>
+                <include>org.apache.spark.connect.proto.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.thirdparty</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.sql.connect.common</pattern>

Review Comment:
   > It allows client and server in the same JVM
   
   `client` and `server` should not in the same JVM due to `server` depends on `sql` on runtime ,   `client` and `sql` have  many classes with the same `packageName + className`, let `client` and `sql` in same JVM will cause some issue with class loading.  For example, When `client` and `sql` are in the same classpath, the JVM will only load one `org.apache.spark.sql.SparkSession`, which may from  `client` or `sql`, unless we manually use a different `ClassLoader` to load them,  otherwise, JVM cannot load and use two different `org.apache.spark.sql.SparkSession` in the same process.
   
   
   
   > client 3.4 + server 4.0 in the same JVM
   
   I don't understand why relocation `common` can support this. For the class in `common`, they are still in the same package, and maybe adding the spark version in to package name can they really make them different. Of course, even then, there will be the problems mentioned above
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix shading so that the JVM client jar can include all 3rd-party dependencies in the runtime.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #39866:
URL: https://github.com/apache/spark/pull/39866#issuecomment-1433525341

   Merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39866: [WIP] Fix the client dependency jars

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39866:
URL: https://github.com/apache/spark/pull/39866#issuecomment-1415166009

   cc @LuciferYang FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1095454194


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
+              <include>com.google.j2objc:*</include>
+              <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.objenesis:*</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.netty</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.checkerframework</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.checkerframework</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>javax.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.perfmark</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.codehaus</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>android.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern>
             </relocation>
+            <!-- Relocate the protos from spark-common into a client only package -->
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.connect.proto</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.proto</shadedPattern>
               <includes>
-                <include>com.google.common.**</include>
+                <include>org.apache.spark.connect.proto.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.thirdparty</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.sql.connect.common</pattern>

Review Comment:
   why we need relocation `connect-common`?  What conflicts will happen when only shade? How about just shade it?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1098349865


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -60,27 +67,22 @@
       <scope>compile</scope>
     </dependency>
     <dependency>
-      <groupId>com.google.guava</groupId>
-      <artifactId>guava</artifactId>
-      <version>${guava.version}</version>
+      <groupId>io.netty</groupId>
+      <artifactId>netty-codec-http2</artifactId>
+      <version>${netty.version}</version>
       <scope>compile</scope>
     </dependency>
-    <!--
-      SPARK-42213: Add `repl` as test dependency to solve the issue that
-      `ClientE2ETestSuite` cannot find `repl.Main` when maven test.
-    -->
     <dependency>
-      <groupId>org.apache.spark</groupId>
-      <artifactId>spark-repl_${scala.binary.version}</artifactId>

Review Comment:
   OK, I checked too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1098791651


##########
project/SparkBuild.scala:
##########
@@ -873,10 +873,14 @@ object SparkConnectClient {
     },
 
     (assembly / assemblyShadeRules) := Seq(
-      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.grpc.@0").inAll,
-      ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
-      ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
-      ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
+      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll,
+      ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll,
+      ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll,

Review Comment:
   There are still some differences between the packaging results of maven-shade and sbt-assembly. For example, netty does not exist in `spark-connect-client-jvm-assembly-3.5.0-SNAPSHOT.jar`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1098833915


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   if we shade and relocation netty, the content in `META-INF/native-image/io.netty` should also relocated, just as in `grpc-netty-shaded`.
   
   Now, the content of `META-INF/native-image/io.netty/netty-codec-http2/native-image.properties` is
   
   ```
   Args = --initialize-at-build-time=io.netty \
          --initialize-at-run-time=io.netty.handler.codec.http2.Http2CodecUtil,io.netty.handler.codec.http2.Http2ClientUpgradeCodec,io.netty.handler.codec.http2.Http2ConnectionHandler,io.netty.handler.codec.http2.DefaultHttp2FrameWriter
   ```
   
   but I think it should like 
   
   ```
   Args = --initialize-at-build-time=org.sparkproject.connect.client.io.netty \
          --initialize-at-run-time=org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2CodecUtil,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ClientUpgradeCodec,org.sparkproject.connect.client.io.netty.handler.codec.http2.Http2ConnectionHandler,org.sparkproject.connect.client.io.netty.handler.codec.http2.DefaultHttp2FrameWriter       
   ```
   Other Transformer may need to be added
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1095998446


##########
connector/connect/common/pom.xml:
##########
@@ -109,22 +88,6 @@
             <version>${netty.version}</version>
             <scope>provided</scope>
         </dependency>
-        <dependency> <!-- necessary for Java 9+ -->

Review Comment:
   No, This part was first added to connect module by @HyukjinKwon 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096029945


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -60,27 +67,22 @@
       <scope>compile</scope>
     </dependency>
     <dependency>
-      <groupId>com.google.guava</groupId>
-      <artifactId>guava</artifactId>
-      <version>${guava.version}</version>
+      <groupId>io.netty</groupId>
+      <artifactId>netty-codec-http2</artifactId>
+      <version>${netty.version}</version>
       <scope>compile</scope>
     </dependency>
-    <!--
-      SPARK-42213: Add `repl` as test dependency to solve the issue that
-      `ClientE2ETestSuite` cannot find `repl.Main` when maven test.
-    -->
     <dependency>
-      <groupId>org.apache.spark</groupId>
-      <artifactId>spark-repl_${scala.binary.version}</artifactId>

Review Comment:
   > Can you give me a link to the issue? I am wondering if we still need REPL after we've changed to spark-submit?
   
   Maybe it is. can check it manually
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1095311239


##########
connector/connect/common/pom.xml:
##########
@@ -54,77 +54,22 @@
             <groupId>org.scala-lang</groupId>
             <artifactId>scala-library</artifactId>
         </dependency>
-        <dependency>
-            <groupId>com.google.guava</groupId>
-            <artifactId>guava</artifactId>
-            <version>${guava.version}</version>
-            <scope>compile</scope>
-        </dependency>
-        <dependency>
-            <groupId>com.google.guava</groupId>
-            <artifactId>failureaccess</artifactId>
-            <version>${guava.failureaccess.version}</version>
-        </dependency>
         <dependency>
             <groupId>com.google.protobuf</groupId>
             <artifactId>protobuf-java</artifactId>
             <version>${protobuf.version}</version>
             <scope>compile</scope>
         </dependency>
-        <dependency>
-            <groupId>io.grpc</groupId>
-            <artifactId>grpc-netty</artifactId>
-            <version>${io.grpc.version}</version>
-        </dependency>
         <dependency>
             <groupId>io.grpc</groupId>
             <artifactId>grpc-protobuf</artifactId>
             <version>${io.grpc.version}</version>
         </dependency>
-        <dependency>
-            <groupId>io.grpc</groupId>
-            <artifactId>grpc-services</artifactId>
-            <version>${io.grpc.version}</version>
-        </dependency>
         <dependency>
             <groupId>io.grpc</groupId>
             <artifactId>grpc-stub</artifactId>
             <version>${io.grpc.version}</version>
         </dependency>
-        <dependency>

Review Comment:
   Restore netty dependency as provided as we used the unshaded grpc dependencies.
   See more https://github.com/grpc/grpc-java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096818870


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
+              <include>com.google.j2objc:*</include>
+              <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.objenesis:*</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.netty</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.checkerframework</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.checkerframework</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>javax.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.perfmark</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.codehaus</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>android.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern>
             </relocation>
+            <!-- Relocate the protos from spark-common into a client only package -->
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.connect.proto</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.proto</shadedPattern>
               <includes>
-                <include>com.google.common.**</include>
+                <include>org.apache.spark.connect.proto.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.thirdparty</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.sql.connect.common</pattern>

Review Comment:
   I skipped the shading of grpc classes for now.



##########
project/SparkBuild.scala:
##########
@@ -862,21 +862,27 @@ object SparkConnectClient {
     (assembly / assemblyPackageScala / assembleArtifact) := false,
 
     // Exclude `pmml-model-*.jar`, `scala-collection-compat_*.jar`,`jsr305-*.jar` and
-    // `netty-*.jar` and `unused-1.0.0.jar` from assembly.
+    // `unused-1.0.0.jar` and checker-qual from assembly.
     (assembly / assemblyExcludedJars) := {
       val cp = (assembly / fullClasspath).value
       cp filter { v =>
         val name = v.data.getName
         name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") ||
-          name.startsWith("jsr305-") || name.startsWith("netty-") || name == "unused-1.0.0.jar"
+          name.startsWith("jsr305-") || name == "unused-1.0.0.jar" || name.startsWith("checker-qual")
       }
     },
 
     (assembly / assemblyShadeRules) := Seq(
-      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.grpc.@0").inAll,
-      ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
-      ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
-      ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
+      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll,
+      ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll,
+      ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll,
+      ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll,
+      ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll,

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1095995560


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -60,27 +67,22 @@
       <scope>compile</scope>
     </dependency>
     <dependency>
-      <groupId>com.google.guava</groupId>
-      <artifactId>guava</artifactId>
-      <version>${guava.version}</version>
+      <groupId>io.netty</groupId>
+      <artifactId>netty-codec-http2</artifactId>
+      <version>${netty.version}</version>
       <scope>compile</scope>
     </dependency>
-    <!--
-      SPARK-42213: Add `repl` as test dependency to solve the issue that
-      `ClientE2ETestSuite` cannot find `repl.Main` when maven test.
-    -->
     <dependency>
-      <groupId>org.apache.spark</groupId>
-      <artifactId>spark-repl_${scala.binary.version}</artifactId>

Review Comment:
   The comments in `jvm/pom.xml` says SPARK-42213
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1098883805


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   I suggest not to shade netty related content because Grpc has implemented `NettyResourceTransformer` to relocation `native-image.properties` , but I haven't found a similar implementation in `maven-shade-plugin`.
   
   Or what better solution do you have? @zhenlineo 
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1100468326


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   Yes,  you are right.
   
   To clarify, I mean we need to ensure that the `native-image.properties` files in `META-INF/native-image/io.netty/netty-xxx/` are rewritten correctly if we need to shade netty



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1095940834


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -60,27 +67,22 @@
       <scope>compile</scope>
     </dependency>
     <dependency>
-      <groupId>com.google.guava</groupId>
-      <artifactId>guava</artifactId>
-      <version>${guava.version}</version>
+      <groupId>io.netty</groupId>
+      <artifactId>netty-codec-http2</artifactId>
+      <version>${netty.version}</version>
       <scope>compile</scope>
     </dependency>
-    <!--
-      SPARK-42213: Add `repl` as test dependency to solve the issue that
-      `ClientE2ETestSuite` cannot find `repl.Main` when maven test.
-    -->
     <dependency>
-      <groupId>org.apache.spark</groupId>
-      <artifactId>spark-repl_${scala.binary.version}</artifactId>

Review Comment:
   Can you give me a link to the issue? I am wondering if we still need REPL after we've changed to spark-submit?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096022999


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
+              <include>com.google.j2objc:*</include>
+              <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.objenesis:*</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.netty</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.checkerframework</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.checkerframework</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>javax.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.perfmark</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.codehaus</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>android.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern>
             </relocation>
+            <!-- Relocate the protos from spark-common into a client only package -->
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.connect.proto</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.proto</shadedPattern>
               <includes>
-                <include>com.google.common.**</include>
+                <include>org.apache.spark.connect.proto.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.thirdparty</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.sql.connect.common</pattern>

Review Comment:
   > It allows client and server in the same JVM
   
   `client` and `server` should not in the same JVM due to `server` depends on `sql` on runtime ,   but `client` and `sql` have  many classes with the same `packageName + className`, let `client` and `sql` in same JVM will cause some issue with class loading.  For example, When `client` and `sql` are in the same classpath, the JVM will only load one `org.apache.spark.sql.SparkSession` to use, which may from  `client` or `sql`, unless we manually use a different `ClassLoader` to load them,  otherwise, JVM cannot load and use two different `org.apache.spark.sql.SparkSession` in the same process.
   
   
   
   > client 3.4 + server 4.0 in the same JVM
   
   I don't understand why relocation `common` can support this. For the class in `common`, they are still in the same package and some classes have some class name, maybe adding the spark version in to package name can make them really different. Of course, even then, there will be the problems mentioned above
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1098883805


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   I suggest not to shade netty related content because Grpc has implemented `NettyResourceTransformer` to relocation `native-image.properties` and re-write the content of them, but I haven't found a similar implementation in `maven-shade-plugin`.
   
   Or what better solution do you have? @zhenlineo 
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096818522


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>

Review Comment:
   It is in the maven dependency tree. I am not sure if we can exclude it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1105136902


##########
project/SparkBuild.scala:
##########
@@ -873,10 +873,14 @@ object SparkConnectClient {
     },
 
     (assembly / assemblyShadeRules) := Seq(
-      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.grpc.@0").inAll,
-      ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
-      ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
-      ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
+      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll,
+      ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll,
+      ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll,

Review Comment:
   Fixed the netty issue. I still feel it is the best to do allowlist so that we do not miss anything. The maven cannot do allowlist because the shading rule defined in the parent pom.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1105161173


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   The maven shading do not require the file `native-image.propertie`. It is a graal file. Isn't? For a normal maven project point of view, the current solution is already complete. See [maven shading examples](https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html). Or did I missed anything here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1105443947


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   OK,  I see. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1095939209


##########
project/SparkBuild.scala:
##########
@@ -862,21 +862,27 @@ object SparkConnectClient {
     (assembly / assemblyPackageScala / assembleArtifact) := false,
 
     // Exclude `pmml-model-*.jar`, `scala-collection-compat_*.jar`,`jsr305-*.jar` and
-    // `netty-*.jar` and `unused-1.0.0.jar` from assembly.
+    // `unused-1.0.0.jar` and checker-qual from assembly.
     (assembly / assemblyExcludedJars) := {
       val cp = (assembly / fullClasspath).value
       cp filter { v =>
         val name = v.data.getName
         name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") ||
-          name.startsWith("jsr305-") || name.startsWith("netty-") || name == "unused-1.0.0.jar"
+          name.startsWith("jsr305-") || name == "unused-1.0.0.jar" || name.startsWith("checker-qual")
       }
     },
 
     (assembly / assemblyShadeRules) := Seq(
-      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.grpc.@0").inAll,
-      ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
-      ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
-      ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
+      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll,
+      ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll,
+      ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll,
+      ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll,
+      ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll,

Review Comment:
   Yeah, currently the sbt and maven result are not exactly the same. SBT seems skipped all runtime dependencies. Need to look a bit more. I assume you got the SBT result, as all above are "runtime" dependencies.



##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
+              <include>com.google.j2objc:*</include>
+              <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.objenesis:*</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.netty</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.checkerframework</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.checkerframework</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>javax.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.perfmark</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.codehaus</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>android.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern>
             </relocation>
+            <!-- Relocate the protos from spark-common into a client only package -->
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.connect.proto</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.proto</shadedPattern>
               <includes>
-                <include>com.google.common.**</include>
+                <include>org.apache.spark.connect.proto.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.thirdparty</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.sql.connect.common</pattern>

Review Comment:
   It allows client and server in the same JVM. e.g. client 3.4 + server 4.0 in the same JVM. I see the work is not so much so I included this anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1098883805


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   I suggest not to shade netty related content
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096022999


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
+              <include>com.google.j2objc:*</include>
+              <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.objenesis:*</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.netty</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.checkerframework</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.checkerframework</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>javax.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.perfmark</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.codehaus</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>android.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern>
             </relocation>
+            <!-- Relocate the protos from spark-common into a client only package -->
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.connect.proto</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.proto</shadedPattern>
               <includes>
-                <include>com.google.common.**</include>
+                <include>org.apache.spark.connect.proto.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.thirdparty</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.sql.connect.common</pattern>

Review Comment:
   > It allows client and server in the same JVM
   
   `client` and `server` should not in the same JVM due to `server` depends on `sql` on runtime ,   but `client` and `sql` have  many classes with the same `packageName + className`, let `client` and `sql` in same JVM will cause some issue with class loading.  For example, When `client` and `sql` are in the same classpath, the JVM will only load one `org.apache.spark.sql.SparkSession`, which may from  `client` or `sql`, unless we manually use a different `ClassLoader` to load them,  otherwise, JVM cannot load and use two different `org.apache.spark.sql.SparkSession` in the same process.
   
   
   
   > client 3.4 + server 4.0 in the same JVM
   
   I don't understand why relocation `common` can support this. For the class in `common`, they are still in the same package and some classes have some class name, maybe adding the spark version in to package name can make them really different. Of course, even then, there will be the problems mentioned above
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096818667


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -60,27 +67,22 @@
       <scope>compile</scope>
     </dependency>
     <dependency>
-      <groupId>com.google.guava</groupId>
-      <artifactId>guava</artifactId>
-      <version>${guava.version}</version>
+      <groupId>io.netty</groupId>
+      <artifactId>netty-codec-http2</artifactId>
+      <version>${netty.version}</version>
       <scope>compile</scope>
     </dependency>
-    <!--
-      SPARK-42213: Add `repl` as test dependency to solve the issue that
-      `ClientE2ETestSuite` cannot find `repl.Main` when maven test.
-    -->
     <dependency>
-      <groupId>org.apache.spark</groupId>
-      <artifactId>spark-repl_${scala.binary.version}</artifactId>

Review Comment:
   I checked manually. It should is okay now without.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1095311239


##########
connector/connect/common/pom.xml:
##########
@@ -54,77 +54,22 @@
             <groupId>org.scala-lang</groupId>
             <artifactId>scala-library</artifactId>
         </dependency>
-        <dependency>
-            <groupId>com.google.guava</groupId>
-            <artifactId>guava</artifactId>
-            <version>${guava.version}</version>
-            <scope>compile</scope>
-        </dependency>
-        <dependency>
-            <groupId>com.google.guava</groupId>
-            <artifactId>failureaccess</artifactId>
-            <version>${guava.failureaccess.version}</version>
-        </dependency>
         <dependency>
             <groupId>com.google.protobuf</groupId>
             <artifactId>protobuf-java</artifactId>
             <version>${protobuf.version}</version>
             <scope>compile</scope>
         </dependency>
-        <dependency>
-            <groupId>io.grpc</groupId>
-            <artifactId>grpc-netty</artifactId>
-            <version>${io.grpc.version}</version>
-        </dependency>
         <dependency>
             <groupId>io.grpc</groupId>
             <artifactId>grpc-protobuf</artifactId>
             <version>${io.grpc.version}</version>
         </dependency>
-        <dependency>
-            <groupId>io.grpc</groupId>
-            <artifactId>grpc-services</artifactId>
-            <version>${io.grpc.version}</version>
-        </dependency>
         <dependency>
             <groupId>io.grpc</groupId>
             <artifactId>grpc-stub</artifactId>
             <version>${io.grpc.version}</version>
         </dependency>
-        <dependency>

Review Comment:
   Restore netty dependency as provided as we used the unshaded grpc dependencies.
   See more https://github.com/grpc/grpc-java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1100451807


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   Let me come back to you. grpc has this grpc with shaded netty and the one without. We used the one without netty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell closed pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix shading so that the JVM client jar can include all 3rd-party dependencies in the runtime.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix shading so that the JVM client jar can include all 3rd-party dependencies in the runtime.
URL: https://github.com/apache/spark/pull/39866


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1105140661


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   I understand this file is for graal support. If so, can we leave it in another PR? I do not have any good idea how to use graal and verify it works as expected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1105447214


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   Please wait for me to do some more check, will feedback tomorrow, thanks @zhenlineo 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #39866:
URL: https://github.com/apache/spark/pull/39866#issuecomment-1415251213

   I will close https://github.com/apache/spark/pull/39857 due to you are trying to solve similar issue, feel free to reuse the jira id if need.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096027436


##########
project/SparkBuild.scala:
##########
@@ -862,21 +862,27 @@ object SparkConnectClient {
     (assembly / assemblyPackageScala / assembleArtifact) := false,
 
     // Exclude `pmml-model-*.jar`, `scala-collection-compat_*.jar`,`jsr305-*.jar` and
-    // `netty-*.jar` and `unused-1.0.0.jar` from assembly.
+    // `unused-1.0.0.jar` and checker-qual from assembly.
     (assembly / assemblyExcludedJars) := {
       val cp = (assembly / fullClasspath).value
       cp filter { v =>
         val name = v.data.getName
         name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") ||
-          name.startsWith("jsr305-") || name.startsWith("netty-") || name == "unused-1.0.0.jar"
+          name.startsWith("jsr305-") || name == "unused-1.0.0.jar" || name.startsWith("checker-qual")
       }
     },
 
     (assembly / assemblyShadeRules) := Seq(
-      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.grpc.@0").inAll,
-      ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
-      ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
-      ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
+      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll,
+      ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll,
+      ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll,
+      ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll,
+      ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll,

Review Comment:
   > ```
   > [error] /home/runner/work/spark/spark/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/DatasetSuite.scala:22:16: object netty is not a member of package io.grpc
   > [error] import io.grpc.netty.NettyServerBuilder
   > [error]                ^
   > [error] /home/runner/work/spark/spark/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/DatasetSuite.scala:52:14: not found: value NettyServerBuilder
   > [error]     val sb = NettyServerBuilder
   > [error]              ^
   > [error] /home/runner/work/spark/spark/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionSuite.scala:20:16: object netty is not a member of package io.grpc
   > [error] import io.grpc.netty.NettyServerBuilder
   > [error]                ^
   > [error] /home/runner/work/spark/spark/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionSuite.scala:38:14: not found: value NettyServerBuilder
   > [error]     val sb = NettyServerBuilder
   > [error]              ^
   > [error] /home/runner/work/spark/spark/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala:22:16: object netty is not a member of package io.grpc
   > [error] import io.grpc.netty.NettyServerBuilder
   > [error]                ^
   > [error] /home/runner/work/spark/spark/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala:39:14: not found: value NettyServerBuilder
   > [error]     val sb = NettyServerBuilder
   > [error]              ^
   > [error] 6 errors found
   > ```
   > 
   > GA failed as above
   
   I can't check the result of maven because of maven compilation failure  now
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096032655


##########
project/SparkBuild.scala:
##########
@@ -862,21 +862,27 @@ object SparkConnectClient {
     (assembly / assemblyPackageScala / assembleArtifact) := false,
 
     // Exclude `pmml-model-*.jar`, `scala-collection-compat_*.jar`,`jsr305-*.jar` and
-    // `netty-*.jar` and `unused-1.0.0.jar` from assembly.
+    // `unused-1.0.0.jar` and checker-qual from assembly.
     (assembly / assemblyExcludedJars) := {
       val cp = (assembly / fullClasspath).value
       cp filter { v =>

Review Comment:
   Maybe change to  use `filterNot` and make the filter condition as `Include` looks clearer
   
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096818823


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -52,6 +52,13 @@
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-catalyst_${scala.binary.version}</artifactId>
       <version>${project.version}</version>
+      <scope>provided</scope>
+      <exclusions>

Review Comment:
   I added Guava. The test does not run correct without. I assume internally we still need this guava dependency.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #39866:
URL: https://github.com/apache/spark/pull/39866#issuecomment-1418813688

   Sorry, there are some other things to do today. I will further review this pr tomorrow
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1095427590


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>

Review Comment:
   `findbugs` ... really need it?
   
   



##########
connector/connect/client/jvm/pom.xml:
##########
@@ -60,27 +67,22 @@
       <scope>compile</scope>
     </dependency>
     <dependency>
-      <groupId>com.google.guava</groupId>
-      <artifactId>guava</artifactId>
-      <version>${guava.version}</version>
+      <groupId>io.netty</groupId>
+      <artifactId>netty-codec-http2</artifactId>
+      <version>${netty.version}</version>
       <scope>compile</scope>
     </dependency>
-    <!--
-      SPARK-42213: Add `repl` as test dependency to solve the issue that
-      `ClientE2ETestSuite` cannot find `repl.Main` when maven test.
-    -->
     <dependency>
-      <groupId>org.apache.spark</groupId>
-      <artifactId>spark-repl_${scala.binary.version}</artifactId>

Review Comment:
   This is a workaround for  maven test and it just a test dependency. Please do not delete it without special reasons
   
   



##########
connector/connect/common/pom.xml:
##########
@@ -109,22 +88,6 @@
             <version>${netty.version}</version>
             <scope>provided</scope>
         </dependency>
-        <dependency> <!-- necessary for Java 9+ -->

Review Comment:
   Does the client not need this dependency when using Java 9+?



##########
project/SparkBuild.scala:
##########
@@ -862,21 +862,27 @@ object SparkConnectClient {
     (assembly / assemblyPackageScala / assembleArtifact) := false,
 
     // Exclude `pmml-model-*.jar`, `scala-collection-compat_*.jar`,`jsr305-*.jar` and
-    // `netty-*.jar` and `unused-1.0.0.jar` from assembly.
+    // `unused-1.0.0.jar` and checker-qual from assembly.
     (assembly / assemblyExcludedJars) := {
       val cp = (assembly / fullClasspath).value
       cp filter { v =>
         val name = v.data.getName
         name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") ||
-          name.startsWith("jsr305-") || name.startsWith("netty-") || name == "unused-1.0.0.jar"
+          name.startsWith("jsr305-") || name == "unused-1.0.0.jar" || name.startsWith("checker-qual")
       }
     },
 
     (assembly / assemblyShadeRules) := Seq(
-      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.grpc.@0").inAll,
-      ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
-      ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
-      ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
+      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll,
+      ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll,
+      ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll,
+      ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll,
+      ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll,

Review Comment:
   There are only 4 sub dir in `org.sparkproject.connect.client`:
   
   - com	
   - common	
   - io	
   - proto
   
   Some sub dir not found, such as `javax`, `org/codehaus` or `perfmark` found
   
   



##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
+              <include>com.google.j2objc:*</include>
+              <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.objenesis:*</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.netty</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.checkerframework</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.checkerframework</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>javax.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.perfmark</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.codehaus</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>android.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern>
             </relocation>
+            <!-- Relocate the protos from spark-common into a client only package -->
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.connect.proto</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.proto</shadedPattern>
               <includes>
-                <include>com.google.common.**</include>
+                <include>org.apache.spark.connect.proto.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.thirdparty</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.sql.connect.common</pattern>

Review Comment:
   why we need relocation `common`?  What conflicts will happen when only shade?
   
   



##########
connector/connect/client/jvm/pom.xml:
##########
@@ -52,6 +52,13 @@
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-catalyst_${scala.binary.version}</artifactId>
       <version>${project.version}</version>
+      <scope>provided</scope>
+      <exclusions>

Review Comment:
   If client does not depend on Guava, do we need to explicitly exclude it? Or what will happen if not explicitly exclude Guava?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1096022999


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -104,39 +106,70 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
-              <include>com.google.protobuf:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
+              <include>com.google.j2objc:*</include>
+              <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.objenesis:*</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.netty</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.checkerframework</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.checkerframework</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>javax.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>io.perfmark</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>org.codehaus</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern>
+            </relocation>
+            <relocation>
+              <pattern>android.annotation</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern>
             </relocation>
+            <!-- Relocate the protos from spark-common into a client only package -->
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.connect.proto</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.proto</shadedPattern>
               <includes>
-                <include>com.google.common.**</include>
+                <include>org.apache.spark.connect.proto.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.thirdparty</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
+              <pattern>org.apache.spark.sql.connect.common</pattern>

Review Comment:
   > It allows client and server in the same JVM
   
   `client` and `server` should not in the same JVM due to `server` depends on `sql` on runtime ,   `client` and `sql` have  many classes with the same `packageName + className`, let `client` and `sql` in same JVM will cause some issue with class loading.  For example, When `client` and `sql` are in the same classpath, the JVM will only load one `org.apache.spark.sql.SparkSession`, which may from  `client` or `sql`, unless we manually use a different `ClassLoader` to load them,  otherwise, JVM cannot load and use two different `org.apache.spark.sql.SparkSession` in the same process.
   
   
   
   > client 3.4 + server 4.0 in the same JVM
   
   I don't understand why relocation `common` can support this. For the class in `common`, they are still in the same package and some classes have some class name, maybe adding the spark version in to package name can make them really different. Of course, even then, there will be the problems mentioned above
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1095459727


##########
project/SparkBuild.scala:
##########
@@ -862,21 +862,27 @@ object SparkConnectClient {
     (assembly / assemblyPackageScala / assembleArtifact) := false,
 
     // Exclude `pmml-model-*.jar`, `scala-collection-compat_*.jar`,`jsr305-*.jar` and
-    // `netty-*.jar` and `unused-1.0.0.jar` from assembly.
+    // `unused-1.0.0.jar` and checker-qual from assembly.
     (assembly / assemblyExcludedJars) := {
       val cp = (assembly / fullClasspath).value
       cp filter { v =>
         val name = v.data.getName
         name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") ||
-          name.startsWith("jsr305-") || name.startsWith("netty-") || name == "unused-1.0.0.jar"
+          name.startsWith("jsr305-") || name == "unused-1.0.0.jar" || name.startsWith("checker-qual")
       }
     },
 
     (assembly / assemblyShadeRules) := Seq(
-      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.grpc.@0").inAll,
-      ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
-      ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
-      ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
+      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll,
+      ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll,
+      ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll,
+      ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll,
+      ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll,

Review Comment:
   There are only 4 sub dir in `org.sparkproject.connect.client`:
   
   - com	
   - common	
   - io	
   - proto
   
   Some sub dir not found, such as `javax`, `org/codehaus` or `perfmark`
   
   



##########
project/SparkBuild.scala:
##########
@@ -862,21 +862,27 @@ object SparkConnectClient {
     (assembly / assemblyPackageScala / assembleArtifact) := false,
 
     // Exclude `pmml-model-*.jar`, `scala-collection-compat_*.jar`,`jsr305-*.jar` and
-    // `netty-*.jar` and `unused-1.0.0.jar` from assembly.
+    // `unused-1.0.0.jar` and checker-qual from assembly.
     (assembly / assemblyExcludedJars) := {
       val cp = (assembly / fullClasspath).value
       cp filter { v =>
         val name = v.data.getName
         name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") ||
-          name.startsWith("jsr305-") || name.startsWith("netty-") || name == "unused-1.0.0.jar"
+          name.startsWith("jsr305-") || name == "unused-1.0.0.jar" || name.startsWith("checker-qual")
       }
     },
 
     (assembly / assemblyShadeRules) := Seq(
-      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.grpc.@0").inAll,
-      ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
-      ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
-      ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.client.guava.@1").inAll,
+      ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.client.io.grpc.@1").inAll,
+      ShadeRule.rename("com.google.**" -> "org.sparkproject.connect.client.com.google.@1").inAll,
+      ShadeRule.rename("io.netty.**" -> "org.sparkproject.connect.client.io.netty.@1").inAll,
+      ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.client.org.checkerframework.@1").inAll,
+      ShadeRule.rename("javax.annotation.**" -> "org.sparkproject.connect.client.javax.annotation.@1").inAll,

Review Comment:
   There are only 4 sub dir in `org.sparkproject.connect.client`:
   
   - com	
   - common	
   - io	
   - proto
   
   Some sub dir not found, such as `javax`, `org/codehaus` or `io/perfmark`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39866: [WIP] Fix the client dependency jars

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1095940056


##########
connector/connect/common/pom.xml:
##########
@@ -109,22 +88,6 @@
             <version>${netty.version}</version>
             <scope>provided</scope>
         </dependency>
-        <dependency> <!-- necessary for Java 9+ -->

Review Comment:
   I am building with Java 11. Do you have a command to run the related tests?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1098883805


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   I suggest not to shade netty related content because Grpc has implemented `NettyResourceTransformer` to relocation `native-image.properties` and re-write the file content, but I haven't found a similar implementation in `maven-shade-plugin`.
   
   Or what better solution do you have? @zhenlineo 
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1100468326


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   Yes,  you are right.
   
   To clarify, I mean we need to ensure that the `native-image.properties` files in `META-INF/native-image/io.netty/netty-codec-xxx/` are rewritten correctly if we need to shade netty



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #39866:
URL: https://github.com/apache/spark/pull/39866#issuecomment-1427299936

   @zhenlineo anything update?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #39866: [SPARK-42287][CONNECT][BUILD] Fix the client dependency jars

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #39866:
URL: https://github.com/apache/spark/pull/39866#discussion_r1105147048


##########
connector/connect/client/jvm/pom.xml:
##########
@@ -112,40 +124,57 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
+              <include>com.google.android:*</include>
+              <include>com.google.api.grpc:*</include>
+              <include>com.google.code.findbugs:*</include>
+              <include>com.google.code.gson:*</include>
+              <include>com.google.errorprone:*</include>
               <include>com.google.guava:*</include>
-              <include>io.grpc:*</include>
+              <include>com.google.j2objc:*</include>
               <include>com.google.protobuf:*</include>
+              <include>io.grpc:*</include>
+              <include>io.netty:*</include>
+              <include>io.perfmark:*</include>
+              <include>org.codehaus.mojo:*</include>
+              <include>org.checkerframework:*</include>
               <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include>
             </includes>
           </artifactSet>
           <relocations>
             <relocation>
               <pattern>io.grpc</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.grpc</shadedPattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern>
               <includes>
                 <include>io.grpc.**</include>
               </includes>
             </relocation>
             <relocation>
-              <pattern>com.google.protobuf</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.protobuf</shadedPattern>
-              <includes>
-                <include>com.google.protobuf.**</include>
-              </includes>
+              <pattern>com.google</pattern>
+              <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern>
             </relocation>
             <relocation>
-              <pattern>com.google.common</pattern>
-              <shadedPattern>${spark.shade.packageName}.connect.client.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
+              <pattern>io.netty</pattern>

Review Comment:
   No, we should make sure the result of maven shade is correct, because Spark uses maven to make distribution and deploy jar to maven central repository now, not sbt. If there is no clear solution now, I think we should not shade Netty temporarily.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org