You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "LuciferYang (via GitHub)" <gi...@apache.org> on 2024/02/01 06:27:53 UTC

[PR] [SPARK-46939][CORE] Simplify IndylambdaScalaClosures#getSerializationProxy [spark]

LuciferYang opened a new pull request, #44977:
URL: https://github.com/apache/spark/pull/44977

   ### What changes were proposed in this pull request?
   This PR simplifies the `getSerializationProxy` function in the `IndylambdaScalaClosures`. The main change is to remove the `isClosureCandidate` check when determining if a class is a closure because Apache Spark 4.0 no longer support Scala 2.11.
   
   
   ### Why are the changes needed?
   The reason for removing the `isClosureCandidate` check is that in Scala 2.12 and later versions, closures are implemented as synthetic classes generated by the Java LambdaMetafactory. These synthetic classes do not need to implement the `scala.Function` interface, which is what the `isClosureCandidate` check was originally checking for. Therefore, the `isClosureCandidate` check is not necessary for identifying closures in Scala 2.12 and later versions.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Pass GitHub Actions
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46939][CORE] Remove `isClosureCandidate` check from `getSerializationProxy` function in IndylambdaScalaClosures [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #44977:
URL: https://github.com/apache/spark/pull/44977#issuecomment-1920764991

   @rednaxelafx I haven't thought of many of the issues you mentioned, thank you very much for your explanation. At the same time, I think it's better to keep the current state, so let me close this pull request first. Thanks ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46939][CORE] Remove `isClosureCandidate` check from `getSerializationProxy` function in IndylambdaScalaClosures [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang closed pull request #44977: [SPARK-46939][CORE] Remove `isClosureCandidate` check from `getSerializationProxy` function in IndylambdaScalaClosures
URL: https://github.com/apache/spark/pull/44977


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46939][CORE] Remove `isClosureCandidate` check from `getSerializationProxy` function in IndylambdaScalaClosures [spark]

Posted by "rednaxelafx (via GitHub)" <gi...@apache.org>.
rednaxelafx commented on PR #44977:
URL: https://github.com/apache/spark/pull/44977#issuecomment-1920739267

   I'm neutral on lifting the `isClosureCandidate` restriction (as I already mentioned in the comment, it was expected that someone would life it in the future ;-) What I had in mind at the time was like "what if we need to also clean Java lambdas?", which only matters if we add JShell support into Spark.
   All I'd ask is to implement it right instead of doing it half-baked. 
   
   My original intent with the `isClosureCandidate` check was to use a "quick" check to reduce the number of candidates flowing into the later parts. It had nothing to do with Scala 2.11 compatibility/support (so please correct that from your PR description). I had only been developing and testing with Scala 2.12's simple lambda cases, and I know that the ones I were testing with all implemented the `scala.FunctionN` traits, so just being conservative and limiting the support to the cases I've tested with, I added the `isClosureCandidate` and a few other restrictions (like the `$anonfunc$` check that immediately followed, which again isn't a perfect filter, but one that likely filters out regular named functions instead of lambdas).
   
   The special thing about language-level lambdas is that you can have higher confidence in that the class is generated by the compiler (or at least the recipe of the class comes from the compiler), so they'll follow certain patterns and won't have surprises with fields that are hand-added.
   
   It certainly is possible to abuse the imperfect checks/filters I had put in place, to trick the cleaner to trust a non-compiler-generated class as being a Scala lambda... there's quite some room for improvement here.
   
   @vsevolodstep-db Seva had been working on ClosureCleaner improvements like this one (https://github.com/apache/spark/pull/42995) which found quite some cases that I didn't handle right...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46939][CORE] Simplify IndylambdaScalaClosures#getSerializationProxy [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44977:
URL: https://github.com/apache/spark/pull/44977#discussion_r1473858121


##########
common/utils/src/main/scala/org/apache/spark/util/ClosureCleaner.scala:
##########
@@ -610,36 +609,26 @@ private[spark] object IndylambdaScalaClosures extends Logging {
 
   /**
    * Check if the given reference is a indylambda style Scala closure.
-   * If so (e.g. for Scala 2.12+ closures), return a non-empty serialization proxy
-   * (SerializedLambda) of the closure;
-   * otherwise (e.g. for Scala 2.11 closures) return None.
+   * Return a non-empty serialization proxy (SerializedLambda) of the closure.
    *
    * @param maybeClosure the closure to check.
    */
   def getSerializationProxy(maybeClosure: AnyRef): Option[SerializedLambda] = {
-    def isClosureCandidate(cls: Class[_]): Boolean = {
-      // TODO: maybe lift this restriction to support other functional interfaces in the future
-      val implementedInterfaces = ClassUtils.getAllInterfaces(cls).asScala
-      implementedInterfaces.exists(_.getName.startsWith("scala.Function"))
-    }
-
     maybeClosure.getClass match {
       // shortcut the fast check:
       // 1. indylambda closure classes are generated by Java's LambdaMetafactory, and they're
       //    always synthetic.
       // 2. We only care about Serializable closures, so let's check that as well
       case c if !c.isSynthetic || !maybeClosure.isInstanceOf[Serializable] => None
 
-      case c if isClosureCandidate(c) =>
+      case _ =>

Review Comment:
   cc @rednaxelafx May I ask if this modification attempt is correct? Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org