You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/05 05:40:56 UTC

[GitHub] [spark] rmcyang opened a new pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

rmcyang opened a new pull request #34158:
URL: https://github.com/apache/spark/pull/34158


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   - Make the val lazy wherever `isPushBasedShuffleEnabled` is invoked when it is a class instance variable, so it can happen after user-defined jars/classes in `spark.kryo.classesToRegister` are downloaded and available on executor-side, as part of the fix for the exception mentioned below.
   
   - Add a flag `checkSerializer` to control whether we need to check a serializer is `supportsRelocationOfSerializedObjects` or not within `isPushBasedShuffleEnabled` as part of the fix for the exception mentioned below. Specifically, we don't check this in `registerWithExternalShuffleServer()` in `BlockManager` and `createLocalDirsForMergedShuffleBlocks()` in `DiskBlockManager.scala` as the same issue would raise otherwise.
   
   - Move `instantiateClassFromConf` and `instantiateClass` from `SparkEnv` into `Utils`, in order to let `isPushBasedShuffleEnabled` to leverage them for instantiating serializer instances.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   When user tries to set classes for Kryo Serialization by `spark.kryo.classesToRegister`, below exception(or similar) would be encountered in `isPushBasedShuffleEnabled` as indicated below.
   Reproduced the issue in our internal branch by launching spark-shell as:
   ```
   spark-shell --spark-version 3.1.1 --packages ml.dmlc:xgboost4j_2.12:1.3.1 --conf spark.kryo.classesToRegister=ml.dmlc.xgboost4j.scala.Booster
   ```
   
   ```
   Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
   	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
   	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:393)
   	at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:83)
   	at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala)
   Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
   	at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:183)
   	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:230)
   	at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:171)
   	at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)
   	at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
   	at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)
   	at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346)
   	at org.apache.spark.serializer.KryoSerializerInstance.getAutoReset(KryoSerializer.scala:446)
   	at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects$lzycompute(KryoSerializer.scala:253)
   	at org.apache.spark.serializer.KryoSerializer.supportsRelocationOfSerializedObjects(KryoSerializer.scala:249)
   	at org.apache.spark.util.Utils$.isPushBasedShuffleEnabled(Utils.scala:2584)
   	at org.apache.spark.MapOutputTrackerWorker.<init>(MapOutputTracker.scala:1109)
   	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:322)
   	at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:205)
   	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:442)
   	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
   	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
   	... 4 more
   Caused by: java.lang.ClassNotFoundException: ml.dmlc.xgboost4j.scala.Booster
   	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   	at java.lang.Class.forName0(Native Method)
   	at java.lang.Class.forName(Class.java:348)
   	at org.apache.spark.util.Utils$.classForName(Utils.scala:217)
   	at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$6(KryoSerializer.scala:174)
   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   	at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:173)
   	... 24 more
   ```
   Registering user class for kryo serialization is happening after serializer creation in SparkEnv. Serializer creation can happen in `isPushBasedShuffleEnabled`, which can be called in some places prior to SparkEnv is created. Also, as per analysis by @JoshRosen, this is probably due to Kryo instantiation was failing because added packages hadn't been downloaded to the executor yet (because this code is running during executor startup, not task startup). The proposed change helps fix this issue.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Passed existing tests.
   Tested this patch in our internal branch where user reported the issue. Issue is now not reproducible with this patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934595982


   The test failures are unrelated to the PR, merging to master/branch-3.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931776691


   Agree with your analysis @JoshRosen. That was my interpretation as well when I relooked at it again - I have synced with @rmcyang to make sure we address the lack of jars on executor issue - which is the actual cause of the current problem.
   (Making serializer initialization consistent is required - but is not the immediate cause).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932645924






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm edited a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm edited a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934595982


   The test failures are unrelated to the PR (HiveClient related), merging to master/branch-3.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720742585



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,18 +2603,28 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
-      val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||
-        (conf.get(SHUFFLE_SERVICE_ENABLED) &&
-          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn" &&
-          // TODO: [SPARK-36744] needs to support IO encryption for push-based shuffle
-          !conf.get(IO_ENCRYPTION_ENABLED) &&
-          serializer.supportsRelocationOfSerializedObjects)
-
+      val canDoPushBasedShuffle = {
+        val isTesting = conf.get(IS_TESTING).getOrElse(false)
+        val isShuffleServiceAndYarn = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+            conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
+        lazy val serializerIsSupported = {
+          if (checkSerializer) {
+            instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver)
+              .supportsRelocationOfSerializedObjects

Review comment:
       ```
   Option(SparkEnv.get)
     .map(_.serializer)
      .getOrElse(instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver))
   ```
   thrown NPE after the test `test migration of shuffle blocks during decommissioning` succeeded in `BlockManagerSuite.beforeEach(BlockManagerSuite.scala:188)`. Seems SparkEnv wasn't configured properly before next test starts. Adding `SparkEnv.set(null)` in `afterEach()` seems fixed the issue:
   ```
   org.apache.spark.storage.BlockManagerSuite *** ABORTED *** (41 seconds, 221 milliseconds)
   [info]   java.lang.NullPointerException:
   [info]   at org.apache.spark.util.Utils$.isPushBasedShuffleEnabled(Utils.scala:2619)
   [info]   at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:104)
   [info]   at org.apache.spark.storage.BlockManagerSuite.beforeEach(BlockManagerSuite.scala:188)
   [info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:233)
   [info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
   [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
   [info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
   [info]   at scala.collection.immutable.List.foreach(List.scala:431)
   [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
   [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
   [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
   [info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
   [info]   at org.scalatest.Suite.run(Suite.scala:1112)
   [info]   at org.scalatest.Suite.run$(Suite.scala:1094)
   [info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
   [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
   [info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:62)
   [info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
   [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
   [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
   [info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:62)
   [info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318)
   [info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513)
   [info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
   [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   [info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   [info]   at java.lang.Thread.run(Thread.java:748) 
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932640613


   **[Test build #143793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143793/testReport)** for PR 34158 at commit [`ca4de24`](https://github.com/apache/spark/commit/ca4de242e88c74134d5ef740ed9ab30a8c68125b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang closed pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang closed pull request #34158:
URL: https://github.com/apache/spark/pull/34158


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r719712384



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r721906973



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       @JoshRosen Just posted 82e2e07bd46d13390d1d6c571f001cee41ce3224 which addressed the remaining tests failures. Lmk if reverting is still necessary. Thanks. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720570913



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,18 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf, checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])

Review comment:
       Ack




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934525553


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48369/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932736037


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48313/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934036295


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143827/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934035823


   **[Test build #143827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143827/testReport)** for PR 34158 at commit [`585a7ff`](https://github.com/apache/spark/commit/585a7ff2384b0dfe9f69e96387229e6fe87a564e).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934611349


   Thanks all for the reviews! @mridulm @srowen @JoshRosen @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932431436


   **[Test build #143789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143789/testReport)** for PR 34158 at commit [`ac35545`](https://github.com/apache/spark/commit/ac355451961372cc251706fb783ae2740b9945b8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r721806159



##########
File path: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala
##########
@@ -45,8 +45,8 @@ import org.apache.spark.util.{ShutdownHookManager, Utils}
  *
  * ShuffleDataIO also can change the behavior of deleteFilesOnStop.
  */
-private[spark] class DiskBlockManager(conf: SparkConf, var deleteFilesOnStop: Boolean)
-  extends Logging {
+private[spark] class DiskBlockManager(conf: SparkConf, var deleteFilesOnStop: Boolean,
+    isDriver: Boolean) extends Logging {

Review comment:
       Tiny style nit: wrapping long line to conform with [style guide](https://github.com/databricks/scala-style-guide#spacing-and-indentation)
   
   ```suggestion
   private[spark] class DiskBlockManager(
       conf: SparkConf,
       var deleteFilesOnStop: Boolean,
       isDriver: Boolean)
     extends Logging {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934530688


   **[Test build #143856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143856/testReport)** for PR 34158 at commit [`82e2e07`](https://github.com/apache/spark/commit/82e2e07bd46d13390d1d6c571f001cee41ce3224).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931797688


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48286/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933854216


   **[Test build #143818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143818/testReport)** for PR 34158 at commit [`6f936b1`](https://github.com/apache/spark/commit/6f936b18e520031a152db3ab6bfccefb4bd52e5d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34158: [SPARK-33781][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r719682418



##########
File path: core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala
##########
@@ -463,7 +463,7 @@ private[spark] object ShuffleBlockPusher {
 
   private val BLOCK_PUSHER_POOL: ExecutorService = {
     val conf = SparkEnv.get.conf
-    if (Utils.isPushBasedShuffleEnabled(conf)) {
+    if (Utils.isPushBasedShuffleEnabled(conf, isDriver = false)) {

Review comment:
       We can probably change this to `, isDriver = SparkContext.DRIVER_IDENTIFIER == SparkEnv.get.executorId` to be forward compatible

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {
+    val cls = classForName(className)
+    // Look for a constructor taking a SparkConf and a boolean isDriver, then one taking just
+    // SparkConf, then one taking no arguments
+    try {
+      cls.getConstructor(classOf[SparkConf], java.lang.Boolean.TYPE)
+        .newInstance(conf, java.lang.Boolean.valueOf(isDriver))
+        .asInstanceOf[T]
+    } catch {
+      case _: NoSuchMethodException =>
+        try {
+          cls.getConstructor(classOf[SparkConf]).newInstance(conf).asInstanceOf[T]
+        } catch {
+          case _: NoSuchMethodException =>
+            cls.getConstructor().newInstance().asInstanceOf[T]
+        }
+    }
+  }
+
+  // Create an instance of the class named by the given SparkConf property
+  // if the property is not set, possibly initializing it with our conf
+  def instantiateClassFromConf[T](propertyName: ConfigEntry[String],
+      conf: SparkConf, isDriver: Boolean): T = {
+    instantiateClass[T](conf.get(propertyName), conf, isDriver)
+  }
+
+  def instantiateSerializer(conf: SparkConf, isDriver: Boolean): Serializer = {
+    instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver)
+  }

Review comment:
       Remove this method and inline this ?

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       Make this private ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932494595


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934194434


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143834/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934025582


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48340/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933967238


   **[Test build #143826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143826/testReport)** for PR 34158 at commit [`4ada043`](https://github.com/apache/spark/commit/4ada04335d0d365b044ca4513484f9ce12b5bdd7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931849318


   **[Test build #143775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143775/testReport)** for PR 34158 at commit [`054e631`](https://github.com/apache/spark/commit/054e631f95b2339a98db433c7730a98aa88afd38).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932633147


   **[Test build #143796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143796/testReport)** for PR 34158 at commit [`af5c6cb`](https://github.com/apache/spark/commit/af5c6cbffea5e9f805379401b8fd806f45b9aa0b).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932714634


   **[Test build #143800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143800/testReport)** for PR 34158 at commit [`40efd99`](https://github.com/apache/spark/commit/40efd9939a933c1bcaa90592f919aa9d0c43147a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932834253


   **[Test build #143806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143806/testReport)** for PR 34158 at commit [`79f132a`](https://github.com/apache/spark/commit/79f132a16de11b5c7198fe8f783d82919f4f4d13).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933752051


   **[Test build #143818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143818/testReport)** for PR 34158 at commit [`6f936b1`](https://github.com/apache/spark/commit/6f936b18e520031a152db3ab6bfccefb4bd52e5d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932617201


   
   > 3. In `BlockManager.registerWithExternalShuffleServer`:
   >    When `serializer.supportsRelocationOfSerializedObjects` returns `false` then the executor will register with the push-based shuffle version of `shuffleManagerMeta` even though push-based shuffle won't be used.
   >    Does this result in correct behavior?❓
   >    I'm worried about potentially hard-to-debug issues in the rare scenario where push-based shuffle would be used _except_ for the fact that the user-configured `KryoSerializer` doesn't support relocation of serialized objects. In our test code we have a `org.apache.spark.serializer.RegistratorWithoutAutoReset` which can be used to test the corner-case where that might occur.
   
   For push based shuffle, we pass additional metadata as suffix to shuffleManager id when registering with external shuffle manager (ESS).
   If present, it initializes ESS to support receiving block pushes. But if the application does not push blocks subsequently (because push based shuffle was disabled due to insufficient mergers/serializer relocation issue/etc), it is just functionality that is enabled but not used.
   
   So strictly speaking, this should not change things.
   
   > 
   > Just brainstorming here, but as a possible alternative fix: what if the driver evaluated whether push based shuffle could really be enabled, stored the result of that evaluation into a private internal configuration, then passed that configuration to executors (letting us skip the computation on the executors)? I wonder if there's any precedence for doing this elsewhere in the Spark code or whether we'd run into different initialization order issues.
   
   This is definitely a good idea - and we do use spark conf to pass `spark.app.attempt.id` to executors this way for use in push based shuffle.
   We can add something like `spark.shuffle.push.internal.enabled` - which can be used for this case.
   
   Thoughts ?
   
   +CC @Victsm, @zhouyejoe, @otterc, @venkata91 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934149538


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48347/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720604929



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       Whoops, I overlooked that: it looks like neither of these methods can be made private.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932667287


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r721456252



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,18 +2603,28 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
-      val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||
-        (conf.get(SHUFFLE_SERVICE_ENABLED) &&
-          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn" &&
-          // TODO: [SPARK-36744] needs to support IO encryption for push-based shuffle
-          !conf.get(IO_ENCRYPTION_ENABLED) &&
-          serializer.supportsRelocationOfSerializedObjects)
-
+      val canDoPushBasedShuffle = {
+        val isTesting = conf.get(IS_TESTING).getOrElse(false)
+        val isShuffleServiceAndYarn = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+            conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
+        lazy val serializerIsSupported = {
+          if (checkSerializer) {
+            instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver)
+              .supportsRelocationOfSerializedObjects

Review comment:
       nit: One alternative to hunting down all cases of `SparkEnv.get.serializer == null` is to simply filter it here.
   ```
   Option(SparkEnv.get)
                 .map(_.serializer)
                 .filter(_ != null)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-33781][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931559050


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933794903


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48331/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932834253


   **[Test build #143806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143806/testReport)** for PR 34158 at commit [`79f132a`](https://github.com/apache/spark/commit/79f132a16de11b5c7198fe8f783d82919f4f4d13).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932845819


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48318/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931849961


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143775/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen edited a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen edited a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934039150


   Stepping back a bit, I have a more fundamental conceptual question about the push-based shuffle configurations: could the decision of whether to use push-based shuffle be performed on a per-ShuffleDependency basis rather than a per-app basis?
   
   The `spark.serializer` configuration controls the default serializer, but under the hood `ShuffleDependency` instances are sometimes automatically configured with non-default serializers:
   
   - Pure SQL / DataFrame / Dataset code [uses `UnsafeRowSerializer`](https://github.com/apache/spark/blob/41a16ebf1196bec86aec104e72fd7fb1597c0073/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L132-L133).
   - The RDD shuffle path [will automatically use Kryo](https://github.com/apache/spark/blame/41a16ebf1196bec86aec104e72fd7fb1597c0073/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala#L99-L102) when shuffling RDDs containing only primitive types.
   
   Spark's shuffle code contains a [serialized sorting mode](https://github.com/apache/spark/blame/fa1805db48ca53ece4cbbe42ebb2a9811a142ed2/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L37-L71) which can only be used when  `serializer. supportsRelocationOfSerializedObjects == true`. The decision of whether to use serialized sorting mode is performed on a per-ShuffleDependency (not per-app) basis (by [using a different ShuffleHandle](https://github.com/apache/spark/blame/fa1805db48ca53ece4cbbe42ebb2a9811a142ed2/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L106-L109)).
   
   If we did something similar for push-based shuffle then users could enable PBS and still get some benefit even if they couldn't / didn't want to use Kryo as the default serializer. With that approach we wouldn't need to construct a default serializer instance: during `SparkEnv` creation time or executor startup time we'd only need to know whether the other prerequisites of PBS were met (configuration enabled, shuffle service enabled, IO encryption disabled, etc). We'd still need to check the serializer properties, but that check would happen on the driver and its result would somehow be encoded in the `ShuffleHandle`.
   
   Is that possible given the architecture of push-based shuffle?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934429007


   **[Test build #143856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143856/testReport)** for PR 34158 at commit [`82e2e07`](https://github.com/apache/spark/commit/82e2e07bd46d13390d1d6c571f001cee41ce3224).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934024751


   **[Test build #143826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143826/testReport)** for PR 34158 at commit [`4ada043`](https://github.com/apache/spark/commit/4ada04335d0d365b044ca4513484f9ce12b5bdd7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934429007


   **[Test build #143856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143856/testReport)** for PR 34158 at commit [`82e2e07`](https://github.com/apache/spark/commit/82e2e07bd46d13390d1d6c571f001cee41ce3224).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933989744


   **[Test build #143827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143827/testReport)** for PR 34158 at commit [`585a7ff`](https://github.com/apache/spark/commit/585a7ff2384b0dfe9f69e96387229e6fe87a564e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen edited a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen edited a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934039150


   Stepping back a bit, I have a more fundamental conceptual question about the push-based shuffle configurations: could the decision of whether to use push-based shuffle be performed on a per-ShuffleDependency basis rather than a per-app basis?
   
   The `spark.serializer` configuration controls the default serializer, but under the hood `ShuffleDependency` instances are sometimes automatically configured with non-default serializers:
   
   - Pure SQL / DataFrame / Dataset code [uses `UnsafeRowSerializer`](https://github.com/apache/spark/blob/41a16ebf1196bec86aec104e72fd7fb1597c0073/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L132-L133).
   - The RDD shuffle path [will automatically use Kryo](https://github.com/apache/spark/blame/41a16ebf1196bec86aec104e72fd7fb1597c0073/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala#L99-L102) when shuffling RDDs containing only primitive types.
   
   Spark's shuffle code contains a [serialized sorting mode](https://github.com/apache/spark/blame/fa1805db48ca53ece4cbbe42ebb2a9811a142ed2/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L37-L71) which can only be used when  `serializer. supportsRelocationOfSerializedObjects == true`. The decision of whether to use serialized sorting mode is performed on a per-ShuffleDependency (not per-app) basis (by [using a different ShuffleHandle](https://github.com/apache/spark/blame/fa1805db48ca53ece4cbbe42ebb2a9811a142ed2/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L106-L109)).
   
   If we did something similar for push-based shuffle then users could enable PBS and still get some benefit even if they couldn't / didn't want to use Kryo as the default serializer. With that approach we wouldn't need to construct a default serializer instance: during `SparkEnv` creation time or executor startup time we'd only need to know whether the other prerequisites of PBS were met (configuration enabled, shuffle service enabled, IO encryption disabled, etc). We'd still need to check the serializer properties, but that check would happen on the driver and its result would somehow be encoded in the `ShuffleHandle`.
   
   Is that possible given the architecture of push-based shuffle?
   
   **Edit**: to clarify, I'm posing this as a longer-term question (not for Spark 3.2). If we end up supporting this then my suggestion of a driver configuration that's propagated to the executor wouldn't make sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934525553


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48369/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932730964


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143800/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932816487


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48316/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932632803


   **[Test build #143796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143796/testReport)** for PR 34158 at commit [`af5c6cb`](https://github.com/apache/spark/commit/af5c6cbffea5e9f805379401b8fd806f45b9aa0b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932727353


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48313/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933987166


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48338/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r721933113



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       @rmcyang, I'm good with that fix as long as it passes tests 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932633159


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143796/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934137969


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48347/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932459223


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931730192


   I don't quite get how this is related to push-based shuffle being enabled - that seems pretty different from questions of user classes and serialization? this feels like a hacky solution but I also don't know much about this path


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932586716


   **[Test build #143793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143793/testReport)** for PR 34158 at commit [`ca4de24`](https://github.com/apache/spark/commit/ca4de242e88c74134d5ef740ed9ab30a8c68125b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932486793


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933794903


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48331/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932529164


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143789/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932736130


   **[Test build #143801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143801/testReport)** for PR 34158 at commit [`4f3a712`](https://github.com/apache/spark/commit/4f3a7126688907b3261041fae002d659ceba5e7d).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932644054


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48308/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934019383


   > If the approach described above was easy to implement then I'd prefer it since I think it's easier to reason about: the approach taken in this PR requires me to reason about push-based shuffle internals to argue that it's okay for the executor and driver to have divergent effective configurations in certain places (as discussed upthread).
   > 
   
   @rmcyang Can you file a jira and link it to the improvements jira ?
   Given the correctness, we can get it addressed as follow up after release (this PR is blocking RC). Does that sound fine @JoshRosen  ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933967238


   **[Test build #143826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143826/testReport)** for PR 34158 at commit [`4ada043`](https://github.com/apache/spark/commit/4ada04335d0d365b044ca4513484f9ce12b5bdd7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934461767


   Thanks for the reviews @JoshRosen and @gengliangwang !
   Will wait for the tests to complete and merge to master/branch-3.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932625549


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48305/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720679847



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,18 +2603,30 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
-      val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||
-        (conf.get(SHUFFLE_SERVICE_ENABLED) &&
-          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn" &&
-          // TODO: [SPARK-36744] needs to support IO encryption for push-based shuffle
-          !conf.get(IO_ENCRYPTION_ENABLED) &&
-          serializer.supportsRelocationOfSerializedObjects)
-
+      val canDoPushBasedShuffle = {
+        val isTesting = conf.get(IS_TESTING).getOrElse(false)
+        val isShuffleServiceAndYarn = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+            conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
+        val serializerIsSupported = {

Review comment:
       make this `lazy val`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931774620


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48286/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #34158:
URL: https://github.com/apache/spark/pull/34158


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934390683


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r719742176



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       Oh, we cannot really make it private since it is used in SparkEnv.scala:314.
   ```
   [error] /home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/SparkEnv.scala:314:32: method instantiateClass in object Utils cannot be accessed in object org.apache.spark.util.Utils
   [error]     val shuffleManager = Utils.instantiateClass[ShuffleManager](shuffleMgrClass, conf, isDriver)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933782164


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48331/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932848988


   **[Test build #143806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143806/testReport)** for PR 34158 at commit [`79f132a`](https://github.com/apache/spark/commit/79f132a16de11b5c7198fe8f783d82919f4f4d13).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720742585



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,18 +2603,28 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
-      val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||
-        (conf.get(SHUFFLE_SERVICE_ENABLED) &&
-          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn" &&
-          // TODO: [SPARK-36744] needs to support IO encryption for push-based shuffle
-          !conf.get(IO_ENCRYPTION_ENABLED) &&
-          serializer.supportsRelocationOfSerializedObjects)
-
+      val canDoPushBasedShuffle = {
+        val isTesting = conf.get(IS_TESTING).getOrElse(false)
+        val isShuffleServiceAndYarn = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+            conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
+        lazy val serializerIsSupported = {
+          if (checkSerializer) {
+            instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver)
+              .supportsRelocationOfSerializedObjects

Review comment:
       ```
   Option(SparkEnv.get)
     .map(_.serializer)
      .getOrElse(instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver))
   ```
   throw NPE after the test `test migration of shuffle blocks during decommissioning` succeeded in `BlockManagerSuite.beforeEach(BlockManagerSuite.scala:188)`. Seems SparkEnv wasn't configured properly. Adding `SparkEnv.set(null)` in `afterEach()` seems fixed the issue:
   ```
   org.apache.spark.storage.BlockManagerSuite *** ABORTED *** (41 seconds, 221 milliseconds)
   [info]   java.lang.NullPointerException:
   [info]   at org.apache.spark.util.Utils$.isPushBasedShuffleEnabled(Utils.scala:2619)
   [info]   at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:104)
   [info]   at org.apache.spark.storage.BlockManagerSuite.beforeEach(BlockManagerSuite.scala:188)
   [info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:233)
   [info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
   [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
   [info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
   [info]   at scala.collection.immutable.List.foreach(List.scala:431)
   [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
   [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
   [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
   [info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
   [info]   at org.scalatest.Suite.run(Suite.scala:1112)
   [info]   at org.scalatest.Suite.run$(Suite.scala:1094)
   [info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
   [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
   [info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:62)
   [info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
   [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
   [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
   [info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:62)
   [info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318)
   [info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513)
   [info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
   [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   [info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   [info]   at java.lang.Thread.run(Thread.java:748) 
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931849961


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143775/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932494595


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720589937



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       > The instantiateClassFromConf could be made private
   
   @JoshRosen Can `instantiateClassFromConf` really made private? It's also used to initiate Serializer instance in SparkEnv:
   https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkEnv.scala#L301
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932820748


   **[Test build #143804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143804/testReport)** for PR 34158 at commit [`460b314`](https://github.com/apache/spark/commit/460b314da2312eec9fa635fc2089a91be727627f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720562514



##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##########
@@ -561,7 +562,7 @@ private[spark] class BlockManager(
   private def registerWithExternalShuffleServer(): Unit = {
     logInfo("Registering executor with local external shuffle service.")
     val shuffleManagerMeta =
-      if (Utils.isPushBasedShuffleEnabled(conf)) {
+      if (Utils.isPushBasedShuffleEnabled(conf, isDriver, false)) {

Review comment:
       nit: `false` -> `checkSerializer = false`

##########
File path: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala
##########
@@ -208,7 +208,7 @@ private[spark] class DiskBlockManager(conf: SparkConf, var deleteFilesOnStop: Bo
    * permission to create directories under application local directories.
    */
   private def createLocalDirsForMergedShuffleBlocks(): Unit = {
-    if (Utils.isPushBasedShuffleEnabled(conf)) {
+    if (Utils.isPushBasedShuffleEnabled(conf, isDriver, false)) {

Review comment:
       nit: `false` -> `checkSerializer = false`

##########
File path: core/src/main/scala/org/apache/spark/SparkEnv.scala
##########
@@ -370,7 +344,8 @@ object SparkEnv extends Logging {
           } else {
             None
           }, blockManagerInfo,
-          mapOutputTracker.asInstanceOf[MapOutputTrackerMaster])),
+          mapOutputTracker.asInstanceOf[MapOutputTrackerMaster],
+          isDriver)),

Review comment:
       nit: move `isDriver` to prev line ?

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       @rmcyang Can you rename the method as suggested by @srowen ? It is not a general instantiation of class; but for a class which exposes one of the supported constructor signature.
   Essentially it is specific to either `Serializer` or `ShuffleManager`.

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2629,34 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {
+    val cls = Utils.classForName(className)
+    // Look for a constructor taking a SparkConf and a boolean isDriver, then one taking just
+    // SparkConf, then one taking no arguments
+    try {
+      cls.getConstructor(classOf[SparkConf], java.lang.Boolean.TYPE)
+        .newInstance(conf, java.lang.Boolean.valueOf(isDriver))
+        .asInstanceOf[T]
+    } catch {
+      case _: NoSuchMethodException =>
+        try {
+          cls.getConstructor(classOf[SparkConf]).newInstance(conf).asInstanceOf[T]
+        } catch {
+          case _: NoSuchMethodException =>
+            cls.getConstructor().newInstance().asInstanceOf[T]
+        }
+    }
+  }
+
+  // Create an instance of the class named by the given SparkConf property
+  // if the property is not set, possibly initializing it with our conf
+  def instantiateClassFromConf[T](propertyName: ConfigEntry[String],

Review comment:
       Can this be made private to this class ? And rename to `instantiateSerializerFromConf` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931559050


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934053936


   @rmcyang Thanks for the fix. The latest test failure looks relevant.
   @mridulm Sorry for the late reply. I was on vacation.  The fix overall LGTM.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931775182


   **[Test build #143775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143775/testReport)** for PR 34158 at commit [`054e631`](https://github.com/apache/spark/commit/054e631f95b2339a98db433c7730a98aa88afd38).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931737776


   Thanks for reviewing @srowen.
   The context is that [Utils.isPushBasedShuffleEnabled](https://github.com/apache/spark/blob/119ddd7e9526ed899f88a944babb74af693297f5/core/src/main/scala/org/apache/spark/util/Utils.scala#L2609) is invoked within the initialization path of driver and executor (as part of SparkEnv creation) as well as in other places (after SparkEnv.get is available).
   
   As part of `isPushBasedShuffleEnabled`, we need to check if the serializer supports relocation or not : which requires creation/initialization of the configured Serializer.
   With the existing version, this fails as there are multiple creation paths for `Serializer`.
   This PR should help address this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r721889128



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       @rmcyang, it looks like a few other tests might need similar changes, too. For now I'd recommend reverting 585a7ff2384b0dfe9f69e96387229e6fe87a564e to unblock this PR and then we can tackle the test-only changes in a followup (since it looks like it'll be tricker to fix than I had initially imagined).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932612606


   Thanks for thorough review @JoshRosen @mridulm. The original change indeed was not really to address the actual problem after I did more tests on our internal branch. To better fix the issue, I further made the val lazy wherever `isPushBasedShuffleEnabled` is invoked when it is a class instance variable so it will happen after the jars/classes are available/downloaded. For the case when `isPushBasedShuffleEnabled` is called during executor startup, the idea is we don't check if serializer if relocatable of not by using a flag `checkSerializer`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931775182


   **[Test build #143775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143775/testReport)** for PR 34158 at commit [`054e631`](https://github.com/apache/spark/commit/054e631f95b2339a98db433c7730a98aa88afd38).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on pull request #34158: [SPARK-33781][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931552874


   cc @mridulm @Ngone51 @gengliangwang @dongjoon-hyun @venkata91 @zhouyejoe
   Please take a look。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932610006


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48305/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932851109


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143806/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932823060


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143804/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934039150


   Stepping back a bit, I have a more fundamental conceptual question about the push-based shuffle configurations: could the decision of whether to use push-based shuffle be performed on a per-ShuffleDependency basis rather than a per-app basis?
   
   The `spark.serializer` configuration controls the default serializer, but under the hood `ShuffleDependency` instances are sometimes automatically configured with non-default serializers:
   
   - Pure SQL / DataFrame / Dataset code [uses `UnsafeRowSerializer`](https://github.com/apache/spark/blob/41a16ebf1196bec86aec104e72fd7fb1597c0073/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L132-L133).
   - The RDD shuffle path [will automatically use Kryo](https://github.com/apache/spark/blame/41a16ebf1196bec86aec104e72fd7fb1597c0073/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala#L99-L102) when shuffling RDDs containing only primitive types.
   
   Spark's shuffle code contains a [serialized sorting mode](https://github.com/apache/spark/blame/fa1805db48ca53ece4cbbe42ebb2a9811a142ed2/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L37-L71) which can only be used when  `serializer. supportsRelocationOfSerializedObjects == true`. The decision of whether to use serialized sorting mode is performed on a per-ShuffleDependency (not per-app) basis (by [using a different ShuffleHandle](https://github.com/apache/spark/blame/fa1805db48ca53ece4cbbe42ebb2a9811a142ed2/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L106-L109)).
   
   If we did something similar for push-based shuffle then users could enable PBS and still get some benefit even if they couldn't / didn't want to use Kryo as the default serializer. With that approach we wouldn't need to construct a default serializer instance: during `SparkEnv` creation time or executor startup time we'd only need to know whether the other prerequisites of PBS were met (configuration enabled, shuffle service enabled, IO encryption disabled, etc). We'd still need to check the serializer properties, but that check would happen on the driver and its result would somehow be encoded in the `ShuffleDependency`.
   
   Is that possible given the architecture of push-based shuffle?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932845819


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48318/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932633159


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143796/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932642545


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48308/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934531929


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143856/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932632803


   **[Test build #143796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143796/testReport)** for PR 34158 at commit [`af5c6cb`](https://github.com/apache/spark/commit/af5c6cbffea5e9f805379401b8fd806f45b9aa0b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933876332


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143818/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934031886






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-33781][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931595772


   Also, +CC @srowen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720558670



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       I think there may have been a pre-existing logic bug in the existing code here:
   
   When `conf.get(IS_TESTING)` is true then we don't even check whether the serializer supports relocation.
   
   Looking a few revisions back, it looks like the `IS_TESTING` check was added to allow the push-based testing code path to be used in tests when Spark isn't running on YARN:
   
   https://github.com/apache/spark/blame/1a62e6a2c119df707f15101b03ecff0c3dee62f5/core/src/main/scala/org/apache/spark/util/Utils.scala#L2602-L2607
   
   It looks like #33976 changed the precedence when adding the additional checks involving IO encryption and the serializer.
   
   I find the deeply paren-nested conditionals to be a bit tricky to understand and suspect that's why this bug slipped through.
   
   As long as we're changing this code, what do you think about defining a few helper variables so that the final condition is simpler? Maybe something like this:
   
   ```scala
   val canDoPushBasedShuffle = {
     val isTesting = conf.get(IS_TESTING).getOrElse(false)
     val isShuffleServiceAndYarn = (conf.get(SHUFFLE_SERVICE_ENABLED) &&
          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn")
     val serializerIsSupported = {
       if (checkSerializer) {
         Option(SparkEnv.get)
           .map(_.serializer)
           .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
           .supportsRelocationOfSerializedObjects
       } else {
         false
       }
     }
     val ioEncryptionDisabled = !conf.get(IO_ENCRYPTION_ENABLED)
     (shuffleServiceOnYarn || isTesting) && ioEncryptionDisabled && serializerisSupported
   }
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933876332


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143818/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932631983


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48305/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen edited a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen edited a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934039150


   Stepping back a bit, I have a more fundamental conceptual question about the push-based shuffle configurations: could the decision of whether to use push-based shuffle be performed on a per-ShuffleDependency basis rather than a per-app basis?
   
   The `spark.serializer` configuration controls the default serializer, but under the hood `ShuffleDependency` instances are sometimes automatically configured with non-default serializers:
   
   - Pure SQL / DataFrame / Dataset code [uses `UnsafeRowSerializer`](https://github.com/apache/spark/blob/41a16ebf1196bec86aec104e72fd7fb1597c0073/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L132-L133).
   - The RDD shuffle path [will automatically use Kryo](https://github.com/apache/spark/blame/41a16ebf1196bec86aec104e72fd7fb1597c0073/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala#L99-L102) when shuffling RDDs containing only primitive types.
   
   Spark's shuffle code contains a [serialized sorting mode](https://github.com/apache/spark/blame/fa1805db48ca53ece4cbbe42ebb2a9811a142ed2/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L37-L71) which can only be used when  `serializer. supportsRelocationOfSerializedObjects == true`. The decision of whether to use serialized sorting mode is performed on a per-ShuffleDependency (not per-app) basis (by [using a different ShuffleHandle](https://github.com/apache/spark/blame/fa1805db48ca53ece4cbbe42ebb2a9811a142ed2/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala#L106-L109)).
   
   If we did something similar for push-based shuffle then users could enable PBS and still get some benefit even if they couldn't / didn't want to use Kryo as the default serializer. With that approach we wouldn't need to construct a default serializer instance: during `SparkEnv` creation time or executor startup time we'd only need to know whether the other prerequisites of PBS were met (configuration enabled, shuffle service enabled, IO encryption disabled, etc). We'd still need to check the serializer properties, but that check would happen on the driver and its result would somehow be encoded in the `ShuffleHandle`.
   
   Is that possible given the architecture of push-based shuffle?
   
   **Edit**: to clarify, I'm posing this as a longer-term question (not for Spark 3.2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933989744


   **[Test build #143827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143827/testReport)** for PR 34158 at commit [`585a7ff`](https://github.com/apache/spark/commit/585a7ff2384b0dfe9f69e96387229e6fe87a564e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934008439


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48338/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932529164


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143789/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934084016


   > @rmcyang Thanks for the fix. The latest test failure looks relevant. @mridulm Sorry for the late reply. I was on vacation. The fix overall LGTM.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934194434


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143834/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934464284


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48369/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934472550


   > Stepping back a bit, I have a more fundamental conceptual question about the push-based shuffle configurations: could the decision of whether to use push-based shuffle be performed on a per-ShuffleDependency basis rather than a per-app basis?
   
   Agree, we should relook at this. Probably mirror more closely what batch fetch does.
   This will require a closer look to make sure we make decisions such that, for a shuffle dependency:
   a) We dont push at mapper side if reducers cant read it.
   b) Dont wait for push finalization at driver.
   c) Enable push shuffle reads only if shuffleDep.serializer supports relocation.
   d) Relook at how system is initialized - as currently it is based on a global state.
   
   Current state is a more strict enforcement, which can be relaxed as we evolve the functionality.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932730964


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143800/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934174201


   **[Test build #143834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143834/testReport)** for PR 34158 at commit [`82e2e07`](https://github.com/apache/spark/commit/82e2e07bd46d13390d1d6c571f001cee41ce3224).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932727552


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48312/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933752051


   **[Test build #143818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143818/testReport)** for PR 34158 at commit [`6f936b1`](https://github.com/apache/spark/commit/6f936b18e520031a152db3ab6bfccefb4bd52e5d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932851109


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143806/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932807531


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48316/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720450006



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,18 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf, checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])

Review comment:
       We would need the earlier change for serializer creation @rmcyang. Else this would fail for custom serializers.
   
   Rest of the changes in the PR looks good.
   Essentially the PR should be earlier version + what is in this version together.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720650591



##########
File path: core/src/main/scala/org/apache/spark/storage/PushBasedFetchHelper.scala
##########
@@ -142,7 +142,7 @@ private class PushBasedFetchHelper(
     val mergedBlocksMetaListener = new MergedBlocksMetaListener {
       override def onSuccess(shuffleId: Int, shuffleMergeId: Int, reduceId: Int,
           meta: MergedBlockMeta): Unit = {
-        logInfo(s"Received the meta of push-merged block for ($shuffleId, $shuffleMergeId," +
+        logDebug(s"Received the meta of push-merged block for ($shuffleId, $shuffleMergeId," +

Review comment:
       This change is unrelated to this PR. But given we found that this line could pollute logs in OSS version, and since this is just one single log level change, thus wanted to add it alone with this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932721845


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48312/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932631983


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48305/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932730754


   **[Test build #143800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143800/testReport)** for PR 34158 at commit [`40efd99`](https://github.com/apache/spark/commit/40efd9939a933c1bcaa90592f919aa9d0c43147a).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932518766


   **[Test build #143789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143789/testReport)** for PR 34158 at commit [`ac35545`](https://github.com/apache/spark/commit/ac355451961372cc251706fb783ae2740b9945b8).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932816487


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48316/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720558670



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       I think there may have been a pre-existing logic bug in the existing code here:
   
   When `conf.get(IS_TESTING)` is true then we don't even check whether the serializer supports relocation.
   
   Looking a few revisions back, it looks like the `IS_TESTING` check was added to allow the push-based testing code path to be used in tests when Spark isn't running on YARN:
   
   https://github.com/apache/spark/blame/1a62e6a2c119df707f15101b03ecff0c3dee62f5/core/src/main/scala/org/apache/spark/util/Utils.scala#L2602-L2607
   
   It looks like #33976 changed the precedence when adding the additional checks involving IO encryption and the serializer.
   
   I find the deeply paren-nested conditionals to be a bit tricky to understand and suspect that's why this bug slipped through.
   
   As long as we're changing this code, what do you think about defining a few helper variables so that the final condition is simpler? Maybe something like this:
   
   ```scala
   val canDoPushBasedShuffle = {
     val isTesting = conf.get(IS_TESTING).getOrElse(false)
     val isShuffleServiceAndYarn = (conf.get(SHUFFLE_SERVICE_ENABLED) &&
          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn")
     val serializerIsSupported = {
       if (checkSerializer) {
         Option(SparkEnv.get)
           .map(_.serializer)
           .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
           .supportsRelocationOfSerializedObjects
       } else {
         false
       }
     }
     // TODO: [SPARK-36744] needs to support IO encryption for push-based shuffle
     val ioEncryptionDisabled = !conf.get(IO_ENCRYPTION_ENABLED)
     (shuffleServiceOnYarn || isTesting) && ioEncryptionDisabled && serializerisSupported
   }
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932823060


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143804/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932801157


   **[Test build #143804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143804/testReport)** for PR 34158 at commit [`460b314`](https://github.com/apache/spark/commit/460b314da2312eec9fa635fc2089a91be727627f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932728893


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48312/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932736297


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143801/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720629743



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       Thanks @JoshRosen.
   Agree that defining a few helper variables makes it's easy to understand and makes the final condition simpler. However, `(shuffleServiceOnYarn || isTesting) && ioEncryptionDisabled && serializerisSupported` seems not right -- it breaks below UTs in `DAGSchedulerSuite`:
   ```
   - SPARK-32920: shuffle merge finalization *** FAILED ***
     0 did not equal 2 (DAGSchedulerSuite.scala:3448)
   - SPARK-32920: merger locations not empty *** FAILED ***
     List() was empty (DAGSchedulerSuite.scala:3470)
   - SPARK-32920: merger locations reuse from shuffle dependency *** FAILED ***
     List() was empty (DAGSchedulerSuite.scala:3493)
   - SPARK-32920: Ensure child stage should not start before all the parent stages are completed with shuffle merge finalized for all the parent stages *** FAILED ***
     List() was empty (DAGSchedulerSuite.scala:3563)
   - SPARK-32920: Merge results should be unregistered if the running stage is cancelled before shuffle merge is finalized *** FAILED ***
     0 did not equal 2 (DAGSchedulerSuite.scala:3693)
   - SPARK-32920: SPARK-35549: Merge results should not get registered after shuffle merge finalization *** FAILED ***
     0 did not equal 1 (DAGSchedulerSuite.scala:3735)
   - SPARK-32923: handle stage failure for indeterminate map stage with push-based shuffle *** FAILED ***
     0 did not equal 2 (DAGSchedulerSuite.scala:3833)
   ```
   Before the [first PR](https://github.com/apache/spark/pull/33976/files#diff-2ecc6aef4b0c50bbf146e6c0b3b8b2249375f06a83e2a224c7718cfc850c3af7L2605) for [SPARK-36705](https://issues.apache.org/jira/browse/SPARK-36705), it looks like the condition logic is 
   ```isTesting || (isShuffleService && isYarn)```
    and the proposed change in this PR is actually equivalent to 
   ```isTesting || (isShuffleService && isYarn && ioEncryptionDisabled && serializerIsSupported)```.
   So combine with your idea, I still think it should be like ```isTesting || (shuffleServiceOnYarn && ioEncryptionDisabled && serializerIsSupported)```.
   
   Another thing is when `checkSerializer` is `false`, the if-else should return true instead of false. The purpose here is to skip `supportsRelocationOfSerializedObjects` when we don't want check serializer while keep the push-based shuffle enabled. So overall I think the logic here should be more like below (this also won't fail above UTs):
   ```
         val canDoPushBasedShuffle = {
           val isTesting = conf.get(IS_TESTING).getOrElse(false)
           val isShuffleServiceAndYarn = conf.get(SHUFFLE_SERVICE_ENABLED) &&
               conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
           val serializerIsSupported = {
             if (checkSerializer) {
               Option(SparkEnv.get)
                 .map(_.serializer)
                 .getOrElse(instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver))
                 .supportsRelocationOfSerializedObjects
             } else {
               // if no need to check Serializer, always set serializerIsSupported as true
               true
             }
           }
           // TODO: [SPARK-36744] needs to support IO encryption for push-based shuffle
           val ioEncryptionDisabled = !conf.get(IO_ENCRYPTION_ENABLED)
           isTesting || (isShuffleServiceAndYarn && ioEncryptionDisabled && serializerIsSupported)
         }
   ``` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934149538


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48347/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r721804617



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       On the other hand, the logic change I'm proposing is to fix a nit that existed outside the scope of this PR, so I don't feel strongly about fixing it here/now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934019497


   +CC @Ngone51, @gengliangwang to also take a look. Thx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934083955


   **[Test build #143834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143834/testReport)** for PR 34158 at commit [`82e2e07`](https://github.com/apache/spark/commit/82e2e07bd46d13390d1d6c571f001cee41ce3224).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720561451



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       Yes. The new code moves `instantiateClassFromConf` and `instantiateClass` from SparkEnv into Utils, in order to let `isPushBasedShuffleEnabled` to leverage them for instantiating serializer instances.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931797189


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48286/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934109923


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48347/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932667287


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720571878



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       The `instantiateClassFromConf` could be made private, but I don't think this `instantiateClass` method can:
   
   it's also used to instantiate user-configurable ShuffleManagers: https://github.com/apache/spark/pull/34158/files#diff-7f3f6f8b1c65a0a1f984854d7cb9d97e848c07b96278513a519bde658331a37eL340-R314




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932728893


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48312/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720558670



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       I think there may have been a pre-existing bug in the existing code here:
   
   When `conf.get(IS_TESTING)` is true then we don't even check whether the serializer supports relocation.
   
   Looking a few revisions back, it looks like the `IS_TESTING` check was added to allow the push-based testing code path to be used in tests when Spark isn't running on YARN:
   
   https://github.com/apache/spark/blame/1a62e6a2c119df707f15101b03ecff0c3dee62f5/core/src/main/scala/org/apache/spark/util/Utils.scala#L2602-L2607
   
   It looks like #33976 changed the precedence when adding the additional checks involving IO encryption and the serializer.
   
   I find the deeply paren-nested conditionals to be a bit tricky to understand and suspect that's why this bug slipped through.
   
   As long as we're changing this code, what do you think about defining a few helper variables so that the final condition is simpler? Maybe something like this:
   
   ```scala
   val canDoPushBasedShuffle = {
     val isTesting = conf.get(IS_TESTING).getOrElse(false)
     val isShuffleServiceAndYarn = (conf.get(SHUFFLE_SERVICE_ENABLED) &&
          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn")
     val serializerIsSupported = {
       if (checkSerializer) {
         Option(SparkEnv.get)
           .map(_.serializer)
           .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
           .supportsRelocationOfSerializedObjects
       } else {
         false
       }
     }
     val ioEncryptionDisabled = !conf.get(IO_ENCRYPTION_ENABLED)
     (shuffleServiceOnYarn || isTesting) && ioEncryptionDisabled && serializerisSupported
   }
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934599051


   Merged to master/branch-3.2. +CC @gengliangwang 
   Thanks for fixing this @rmcyang !
   Thanks for the reviews @srowen, @JoshRosen, @gengliangwang for the reviews :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932732414


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48313/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-933750439


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48331/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934031885






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932838378


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48318/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932431436


   **[Test build #143789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143789/testReport)** for PR 34158 at commit [`ac35545`](https://github.com/apache/spark/commit/ac355451961372cc251706fb783ae2740b9945b8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934010025


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48338/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932027009


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932027009


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932645924






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r721804140



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       Regarding `IS_TESTING`: 
   
   It looks like the `DAGSchedulerSuite` test cases are testing push-based shuffle's impact on task scheduling (e.g. to ensure that locality is properly taken into account). Spark's default `JavaSerializer` doesn't support object relocation, so my suggested logic would result in PBS being disabled for these test cases. PBS doesn't work correctly with `JavaSerializer` but I guess that's not a problem for this test case because we're not actually checking the correctness of the results here (just the scheduling logic).
   
   Part of me wonders if we should set `spark.serializer` to Kryo in that suite instead of modifying the `canDoPushBasedShuffle` logic to accommodate that suite.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934531929


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143856/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r721824997



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       Good catch of PBS being disabled for these test cases! Have set `KryoSerializer` in `initPushBasedShuffleConfs` for push-based shuffle related test cases in `DAGSchedulerSuite` and adopted the logic of `(isShuffleServiceAndYarn || isTesting) && ioEncryptionDisabled && serializerIsSupported`. Thanks @JoshRosen




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r719712248



##########
File path: core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala
##########
@@ -463,7 +463,7 @@ private[spark] object ShuffleBlockPusher {
 
   private val BLOCK_PUSHER_POOL: ExecutorService = {
     val conf = SparkEnv.get.conf
-    if (Utils.isPushBasedShuffleEnabled(conf)) {
+    if (Utils.isPushBasedShuffleEnabled(conf, isDriver = false)) {

Review comment:
       Nice call, fixed.

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {
+    val cls = classForName(className)
+    // Look for a constructor taking a SparkConf and a boolean isDriver, then one taking just
+    // SparkConf, then one taking no arguments
+    try {
+      cls.getConstructor(classOf[SparkConf], java.lang.Boolean.TYPE)
+        .newInstance(conf, java.lang.Boolean.valueOf(isDriver))
+        .asInstanceOf[T]
+    } catch {
+      case _: NoSuchMethodException =>
+        try {
+          cls.getConstructor(classOf[SparkConf]).newInstance(conf).asInstanceOf[T]
+        } catch {
+          case _: NoSuchMethodException =>
+            cls.getConstructor().newInstance().asInstanceOf[T]
+        }
+    }
+  }
+
+  // Create an instance of the class named by the given SparkConf property
+  // if the property is not set, possibly initializing it with our conf
+  def instantiateClassFromConf[T](propertyName: ConfigEntry[String],
+      conf: SparkConf, isDriver: Boolean): T = {
+    instantiateClass[T](conf.get(propertyName), conf, isDriver)
+  }
+
+  def instantiateSerializer(conf: SparkConf, isDriver: Boolean): Serializer = {
+    instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver)
+  }

Review comment:
       Fixed.

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932586716


   **[Test build #143793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143793/testReport)** for PR 34158 at commit [`ca4de24`](https://github.com/apache/spark/commit/ca4de242e88c74134d5ef740ed9ab30a8c68125b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932721981


   **[Test build #143801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143801/testReport)** for PR 34158 at commit [`4f3a712`](https://github.com/apache/spark/commit/4f3a7126688907b3261041fae002d659ceba5e7d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932714634


   **[Test build #143800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143800/testReport)** for PR 34158 at commit [`40efd99`](https://github.com/apache/spark/commit/40efd9939a933c1bcaa90592f919aa9d0c43147a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932721981


   **[Test build #143801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143801/testReport)** for PR 34158 at commit [`4f3a712`](https://github.com/apache/spark/commit/4f3a7126688907b3261041fae002d659ceba5e7d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932813203


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48316/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934036295


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143827/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720558670



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,17 +2603,19 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
+      lazy val serializer = Option(SparkEnv.get).map(_.serializer)
+        .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
       val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||

Review comment:
       I think there may have been a pre-existing bug in the existing code here:
   
   When `conf.get(IS_TESTING)` is true then we don't even check whether the serializer supports relocation.
   
   Looking a few revisions back, it looks like the `IS_TESTING` check was added to allow the push-based testing code path to be used in tests when Spark isn't running on YARN:
   
   https://github.com/apache/spark/blame/1a62e6a2c119df707f15101b03ecff0c3dee62f5/core/src/main/scala/org/apache/spark/util/Utils.scala#L2602-L2607
   
   It looks like #33976 changed the precedence when adding the additional checks involving IO encryption and the serializer.
   
   I find the deeply paren-nested conditionals to be a bit tricky to understand and suspect that's why this bug slipped through.
   
   As long as we're changing this code, what do you think about defining a few helper variables so that the final condition is simpler? Maybe something like this:
   
   ```scala
   val canDoPushBasedShuffle = {
     val isTesting = conf.get(IS_TESTING).getOrElse(false)
     val isShuffleServiceAndYarn = (conf.get(SHUFFLE_SERVICE_ENABLED) &&
          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn")
     val serializerIsSupported = {
       if (checkSerializer) {
         Option(SparkEnv.get)
            .map(_.serializer)
           .getOrElse(instantiateClassFromConf[Serializer](SERIALIZER, conf, isDriver))
           .supportsRelocationOfSerializedObjects
       } else {
         false
       }
     }
     val ioEncryptionDisabled = !conf.get(IO_ENCRYPTION_ENABLED)
     (shuffleServiceOnYarn || isTesting) && ioEncryptionDisabled && serializerisSupported
   }
   ```
   
   

##########
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##########
@@ -194,8 +195,8 @@ private[spark] class BlockManager(
   val diskBlockManager = {
     // Only perform cleanup if an external service is not serving our shuffle files.
     val deleteFilesOnStop =
-      !externalShuffleServiceEnabled || executorId == SparkContext.DRIVER_IDENTIFIER
-    new DiskBlockManager(conf, deleteFilesOnStop)
+      !externalShuffleServiceEnabled || isDriver
+    new DiskBlockManager(conf, deleteFilesOnStop, isDriver)

Review comment:
       Nit: can you name the adjacent boolean parameters being passed here? e.g.:
   
   ```suggestion
       new DiskBlockManager(conf, deleteFilesOnStop = deleteFilesOnStop, isDriver = isDriver)
   ```
   
   I think this suggestion applies to a couple of other call-sites, too.
   
   This practice helps to avoid mixups of positional parameters, especially when resolving potential merge conflicts in the future.

##########
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##########
@@ -1126,7 +1126,7 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutputTr
   val mergeStatuses: Map[Int, Array[MergeStatus]] =
     new ConcurrentHashMap[Int, Array[MergeStatus]]().asScala
 
-  private val fetchMergeResult = Utils.isPushBasedShuffleEnabled(conf)
+  private lazy val fetchMergeResult = Utils.isPushBasedShuffleEnabled(conf, isDriver = false)

Review comment:
       Maybe add a comment to explain why this is lazy?
   
   ```suggestion
     // This must be lazy to ensure that it is initialized when the first task is run and not at
     // executor startup time. At startup time, user-added libraries may not have been 
     // downloaded to the executor, causing `isPushBasedShuffleEnabled` to fail when it tries to
     // instantiate a serializer. See the followup to SPARK-36705 for more details.
     private lazy val fetchMergeResult = Utils.isPushBasedShuffleEnabled(conf, isDriver = false)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931797688


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48286/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720721989



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,18 +2603,28 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
-      val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||
-        (conf.get(SHUFFLE_SERVICE_ENABLED) &&
-          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn" &&
-          // TODO: [SPARK-36744] needs to support IO encryption for push-based shuffle
-          !conf.get(IO_ENCRYPTION_ENABLED) &&
-          serializer.supportsRelocationOfSerializedObjects)
-
+      val canDoPushBasedShuffle = {
+        val isTesting = conf.get(IS_TESTING).getOrElse(false)
+        val isShuffleServiceAndYarn = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+            conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
+        lazy val serializerIsSupported = {
+          if (checkSerializer) {
+            instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver)
+              .supportsRelocationOfSerializedObjects

Review comment:
       We should not be creating a new instance each time - if `SparkEnv.get` is available, we should its serializer. Only during initialization of SparkEnv itself should we need a new serializer to be created.
   
   Looks like some badly configured tests which need update.
   

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,18 +2603,28 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
-      val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||
-        (conf.get(SHUFFLE_SERVICE_ENABLED) &&
-          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn" &&
-          // TODO: [SPARK-36744] needs to support IO encryption for push-based shuffle
-          !conf.get(IO_ENCRYPTION_ENABLED) &&
-          serializer.supportsRelocationOfSerializedObjects)
-
+      val canDoPushBasedShuffle = {
+        val isTesting = conf.get(IS_TESTING).getOrElse(false)
+        val isShuffleServiceAndYarn = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+            conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
+        lazy val serializerIsSupported = {
+          if (checkSerializer) {
+            instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver)
+              .supportsRelocationOfSerializedObjects

Review comment:
       We should not be creating a new instance each time - if `SparkEnv.get` is available, we should use its serializer. Only during initialization of SparkEnv itself should we need a new serializer to be created.
   
   Looks like some badly configured tests which need update.
   

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,18 +2603,28 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
-      val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||
-        (conf.get(SHUFFLE_SERVICE_ENABLED) &&
-          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn" &&
-          // TODO: [SPARK-36744] needs to support IO encryption for push-based shuffle
-          !conf.get(IO_ENCRYPTION_ENABLED) &&
-          serializer.supportsRelocationOfSerializedObjects)
-
+      val canDoPushBasedShuffle = {
+        val isTesting = conf.get(IS_TESTING).getOrElse(false)
+        val isShuffleServiceAndYarn = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+            conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
+        lazy val serializerIsSupported = {
+          if (checkSerializer) {
+            instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver)
+              .supportsRelocationOfSerializedObjects

Review comment:
       We should not be creating a new instance each time - if `SparkEnv.get` is available, we should use its serializer. Only during initialization of `SparkEnv` itself should we need a new serializer to be created.
   
   Looks like some badly configured tests which need update.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720742585



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2603,18 +2603,28 @@ private[spark] object Utils extends Logging {
    *   - IO encryption disabled
    *   - serializer(such as KryoSerializer) supports relocation of serialized objects
    */
-  def isPushBasedShuffleEnabled(conf: SparkConf): Boolean = {
+  def isPushBasedShuffleEnabled(conf: SparkConf,
+      isDriver: Boolean,
+      checkSerializer: Boolean = true): Boolean = {
     val pushBasedShuffleEnabled = conf.get(PUSH_BASED_SHUFFLE_ENABLED)
     if (pushBasedShuffleEnabled) {
-      val serializer = Utils.classForName(conf.get(SERIALIZER)).getConstructor(classOf[SparkConf])
-        .newInstance(conf).asInstanceOf[Serializer]
-      val canDoPushBasedShuffle = conf.get(IS_TESTING).getOrElse(false) ||
-        (conf.get(SHUFFLE_SERVICE_ENABLED) &&
-          conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn" &&
-          // TODO: [SPARK-36744] needs to support IO encryption for push-based shuffle
-          !conf.get(IO_ENCRYPTION_ENABLED) &&
-          serializer.supportsRelocationOfSerializedObjects)
-
+      val canDoPushBasedShuffle = {
+        val isTesting = conf.get(IS_TESTING).getOrElse(false)
+        val isShuffleServiceAndYarn = conf.get(SHUFFLE_SERVICE_ENABLED) &&
+            conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
+        lazy val serializerIsSupported = {
+          if (checkSerializer) {
+            instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver)
+              .supportsRelocationOfSerializedObjects

Review comment:
       ```
   Option(SparkEnv.get)
     .map(_.serializer)
      .getOrElse(instantiateSerializerFromConf[Serializer](SERIALIZER, conf, isDriver))
   ```
   thrown NPE after the test `test migration of shuffle blocks during decommissioning` succeeded in `BlockManagerSuite.beforeEach(BlockManagerSuite.scala:188)`. Seems SparkEnv wasn't configured properly. Adding `SparkEnv.set(null)` in `afterEach()` seems fixed the issue:
   ```
   org.apache.spark.storage.BlockManagerSuite *** ABORTED *** (41 seconds, 221 milliseconds)
   [info]   java.lang.NullPointerException:
   [info]   at org.apache.spark.util.Utils$.isPushBasedShuffleEnabled(Utils.scala:2619)
   [info]   at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:104)
   [info]   at org.apache.spark.storage.BlockManagerSuite.beforeEach(BlockManagerSuite.scala:188)
   [info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:233)
   [info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
   [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
   [info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
   [info]   at scala.collection.immutable.List.foreach(List.scala:431)
   [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
   [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
   [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
   [info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
   [info]   at org.scalatest.Suite.run(Suite.scala:1112)
   [info]   at org.scalatest.Suite.run$(Suite.scala:1094)
   [info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
   [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
   [info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
   [info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:62)
   [info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
   [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
   [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
   [info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:62)
   [info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318)
   [info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513)
   [info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
   [info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   [info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   [info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   [info]   at java.lang.Thread.run(Thread.java:748) 
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934498282


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48369/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932801157


   **[Test build #143804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143804/testReport)** for PR 34158 at commit [`460b314`](https://github.com/apache/spark/commit/460b314da2312eec9fa635fc2089a91be727627f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-33781][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931591683


   Thanks for working on this @rmcyang !
   Can you change the jira to follow up of SPARK-36705 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934083955


   **[Test build #143834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143834/testReport)** for PR 34158 at commit [`82e2e07`](https://github.com/apache/spark/commit/82e2e07bd46d13390d1d6c571f001cee41ce3224).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932842582


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48318/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r720587421



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       Since `instantiateClass` is used for either `Serializer` or `ShuffleManager`, I'm thinking renaming it as `instantiateClassForSerializerOrShuffleManager`. Let me know for any better name in your minds. @srowen @mridulm 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934010025


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48338/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932736297


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143801/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #34158:
URL: https://github.com/apache/spark/pull/34158#discussion_r719791592



##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -2627,6 +2627,37 @@ private[spark] object Utils extends Logging {
     }
   }
 
+  // Create an instance of the class with the given name, possibly initializing it with our conf
+  def instantiateClass[T](className: String, conf: SparkConf, isDriver: Boolean): T = {

Review comment:
       This method is highly specific to a Serializer instance, right? then that can be reflected in the name and types




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-934084016


   > @rmcyang Thanks for the fix. The latest test failure looks relevant. @mridulm Sorry for the late reply. I was on vacation. The fix overall LGTM.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34158: [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931752419


   Ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34158: [WIP][SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-932736037


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48313/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rmcyang edited a comment on pull request #34158: [SPARK-33781][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization

Posted by GitBox <gi...@apache.org>.
rmcyang edited a comment on pull request #34158:
URL: https://github.com/apache/spark/pull/34158#issuecomment-931552874


   cc @mridulm @Ngone51 @gengliangwang @dongjoon-hyun @venkata91 @zhouyejoe
   Please take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org