You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "juliuszsompolski (via GitHub)" <gi...@apache.org> on 2023/07/29 13:30:37 UTC

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #42069: [SPARK-43744][CONNECT] Fix class loading problem caused by stub user classes not found on the server classpath

juliuszsompolski commented on code in PR #42069:
URL: https://github.com/apache/spark/pull/42069#discussion_r1278299618


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2547,4 +2547,18 @@ package object config {
       .version("3.5.0")
       .booleanConf
       .createWithDefault(false)
+
+  private[spark] val CONNECT_SCALA_UDF_STUB_CLASSES =
+    ConfigBuilder("spark.connect.scalaUdf.stubClasses")
+      .internal()
+      .doc("""
+          |Comma-separated list of binary names of classes/packages that should be stubbed during
+          |the Scala UDF serde and execution if not found on the server classpath.
+          |An empty list effectively disables stubbing for all missing classes.
+          |By default, the server stubs classes from the Scala client package.
+          |""".stripMargin)

Review Comment:
   So by default we will be stubbing if some Spark Connect client code is pulled into the UDF, but not if the serialization pulls some other class, unrelated to the client and not needed by the UDF, but just referenced in the contained class in a way that will make it pulled in?
   In that case the user would also get an error about ClassNotFound? 
   Do we in that case want the user to add that using an addArtifact, even though it might be unclear to the user why is that relevant to the UDF?
   What are the disadvantages of just stubbing everything?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org