You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "rangadi (via GitHub)" <gi...@apache.org> on 2023/08/08 17:43:15 UTC

[GitHub] [spark] rangadi commented on a diff in pull request #42236: [SPARK-43646][CONNECT][TESTS] Make both SBT and Maven use `spark-proto` uber jar to test the `connect` module

rangadi commented on code in PR #42236:
URL: https://github.com/apache/spark/pull/42236#discussion_r1287455898


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala:
##########
@@ -3233,14 +3233,15 @@ class PlanGenerationTestSuite
     "connect/common/src/test/resources/protobuf-tests/common.desc"
 
   test("from_protobuf messageClassName") {
-    binary.select(pbFn.from_protobuf(fn.col("bytes"), classOf[StorageLevel].getName))
+    binary.select(
+      pbFn.from_protobuf(fn.col("bytes"), "org.apache.spark.sql.protobuf.protos.TestProtoObj"))

Review Comment:
   > In order to use this sql function, users have no choice but to relocate the content of the Java PB description files used in their business according to Spark's project rules(replcace all com.google.protobuf.
   toorg.sparkproject.spark_protobuf.protobuf.
   
   Moving the discussion about "why do users need to shade their java classes?" here in a comment thread. cc: @LuciferYang, @advancedxy, @HyukjinKwon.
   
   There is no other option. Spark at runtime includes unshaded Protobuf library (2.5.x) in its class path. This is was as of a few months back. Don't know if that changed. Its been this way for years. There is no way to support protobuf Java classes for current versions of Protobuf library without shading. 
   
   I agree it is not convenient for the customers. That is why descriptor file is documented as the primary API (both for Scala and python).
   We will soon support Python classes that will help Python customers. 
   FWIW I have a small gitbuf project to help customers create shaded Java classes : https://github.com/rangadi/shaded-protobuf-classes
   



##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala:
##########
@@ -3233,14 +3233,15 @@ class PlanGenerationTestSuite
     "connect/common/src/test/resources/protobuf-tests/common.desc"
 
   test("from_protobuf messageClassName") {

Review Comment:
   This is great! Thank you. 



##########
connector/connect/server/src/test/java/org/apache/spark/sql/protobuf/protos/TestProto.java:
##########
@@ -0,0 +1,57 @@
+// Generated by the protocol buffer compiler.  DO NOT EDIT!
+// source: test.proto
+
+package org.apache.spark.sql.protobuf.protos;
+
+/**
+ * SPARK-43646: This is generated by the `connector/connect/server/src/test/resources/test.proto`
+ * to test the spark protobuf uber jar. If you need to regenerate this file:
+ * 1. Modify `connector/connect/server/src/test/resources/test.proto` to generate Java files
+ * and replace the current file.
+ * 2. Replace all `com.google.protobuf` in the file with `org.sparkproject.spark_protobuf.protobuf.`
+ *  * <p>
+ *  * TODO(SPARK-44606): Generate this file and replace the package names in the file when testing.
+ */
+public final class TestProto {

Review Comment:
   Could you update the doc comment to say why this is included (the current comment talks about a fix, but not the problem). Looks like this uses shaded version of protobuf library. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org