You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "hvanhovell (via GitHub)" <gi...@apache.org> on 2023/08/14 04:00:46 UTC

[GitHub] [spark] hvanhovell opened a new pull request, #42478: [SPARK-44795][CONNECT] CodeGenerator Cache should be classloader specific

hvanhovell opened a new pull request, #42478:
URL: https://github.com/apache/spark/pull/42478

   ### What changes were proposed in this pull request?
   When you currently use a REPL generated class in a UDF you can get an error saying that that class is not equal to that class. This error is thrown in a code generated class. The problem is that the classes have been loaded by different classloaders. We cache generated code and use the textual code as the string. The problem with this is that in Spark Connect users are free in supplying user classes that can have arbitrary names, a name can point to an entirely different class, or it can point to the same class (in bytes) but loaded by a different classloader. We need to add the classloader to the key in the code gen cache to make this safe.
   
   There are roughly two ways how this problem can arise:
   1. Two sessions use the same class names. This is particularly easy when you use the REPL because this  always generates the same names.
   2. You run in single process mode. In this case wholestage codegen will test compile the class using a different classloader then the 'executor', while sharing the same code generator cache.
   
   ### Why are the changes needed?
   We want to be able to use REPL (and other)
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   I added a test to the `ReplE2ESuite`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #42478: [SPARK-44795][CONNECT] CodeGenerator Cache should be classloader specific

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #42478:
URL: https://github.com/apache/spark/pull/42478#discussion_r1292973359


##########
sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala:
##########
@@ -26,28 +26,9 @@ import org.apache.spark.util.SparkClassUtils
 
 object OuterScopes {
   private[this] val queue = new ReferenceQueue[AnyRef]
-  private class HashableWeakReference(v: AnyRef) extends WeakReference[AnyRef](v, queue) {

Review Comment:
   Didn't you introduce this class just previously? Out of curiosity what prompted the change to move this out of the scope here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on pull request #42478: [SPARK-44795][CONNECT] CodeGenerator Cache should be classloader specific

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #42478:
URL: https://github.com/apache/spark/pull/42478#issuecomment-1678288387

   Merging this to master/3.5. The failing test is flaky.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell commented on a diff in pull request #42478: [SPARK-44795][CONNECT] CodeGenerator Cache should be classloader specific

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on code in PR #42478:
URL: https://github.com/apache/spark/pull/42478#discussion_r1294098016


##########
sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala:
##########
@@ -26,28 +26,9 @@ import org.apache.spark.util.SparkClassUtils
 
 object OuterScopes {
   private[this] val queue = new ReferenceQueue[AnyRef]
-  private class HashableWeakReference(v: AnyRef) extends WeakReference[AnyRef](v, queue) {

Review Comment:
   It was a bit weird to use it as part of the OuterScopes object in a different class since the class - conceptually - itself has very little to do with OuterScopes. The other reason was the I didn't want the other refs to pollute the reference queue we are using here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell closed pull request #42478: [SPARK-44795][CONNECT] CodeGenerator Cache should be classloader specific

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #42478: [SPARK-44795][CONNECT] CodeGenerator Cache should be classloader specific
URL: https://github.com/apache/spark/pull/42478


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org