You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alessandro Bellina (Jira)" <ji...@apache.org> on 2023/11/01 16:45:00 UTC

[jira] [Created] (SPARK-45762) Shuffle managers defined in user jars are not available for some launch modes

Alessandro Bellina created SPARK-45762:
------------------------------------------

             Summary: Shuffle managers defined in user jars are not available for some launch modes
                 Key: SPARK-45762
                 URL: https://issues.apache.org/jira/browse/SPARK-45762
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.5.0
            Reporter: Alessandro Bellina
             Fix For: 4.0.0


Starting a spark job in standalone mode with a custom `ShuffleManager` provided in a jar via `--jars` does not work. This can also be experienced in local-cluster mode.

The approach that works consistently is to copy the jar containing the custom `ShuffleManager` to a specific location in each node then add it to `spark.driver.extraClassPath` and `spark.executor.extraClassPath`, but we would like to move away from setting extra configurations unnecessarily.

Example:
{code:java}
$SPARK_HOME/bin/spark-shell \
  --master spark://127.0.0.1:7077 \
  --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
  --jars user-code.jar
{code}
This yields `java.lang.ClassNotFoundException` in the executors.
{code:java}
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1915)
  at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61)
  at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:436)
  at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:425)
  at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.examples.TestShuffleManager
  at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
  at java.base/java.lang.Class.forName0(Native Method)
  at java.base/java.lang.Class.forName(Class.java:467)
  at org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41)
  at org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36)
  at org.apache.spark.util.Utils$.classForName(Utils.scala:95)
  at org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2574)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
  at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:255)
  at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:487)
  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62)
  at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
  at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
  at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
  ... 4 more
{code}
We can change our command to use `extraClassPath`:
{code:java}
$SPARK_HOME/bin/spark-shell \
  --master spark://127.0.0.1:7077 \
  --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
  --conf spark.driver.extraClassPath=user-code.jar \
 --conf spark.executor.extraClassPath=user-code.jar
{code}
Success after adding the jar to `extraClassPath`:
{code:java}
23/10/26 12:58:26 INFO TransportClientFactory: Successfully created connection to localhost/127.0.0.1:33053 after 7 ms (0 ms spent in bootstraps)
23/10/26 12:58:26 WARN TestShuffleManager: Instantiated TestShuffleManager!!
23/10/26 12:58:26 INFO DiskBlockManager: Created local directory at /tmp/spark-cb101b05-c4b7-4ba9-8b3d-5b23baa7cb46/executor-5d5335dd-c116-4211-9691-87d8566017fd/blockmgr-2fcb1ab2-d886-4444-8c7f-9dca2c880c2c
{code}
We would like to change startup order such that the original command succeeds, without specifying `extraClassPath`:
{code:java}
$SPARK_HOME/bin/spark-shell \
  --master spark://127.0.0.1:7077 \
  --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \
  --jars user-code.jar
{code}
Proposed changes:

Refactor code so we initialize the `ShuffleManager` later, after jars have been localized. This is especially necessary in the executor, where we would need to move this initialization until after the `replClassLoader` is updated with jars passed in `--jars`.

Today, the `ShuffleManager` is instantiated at `SparkEnv` creation. Having to instantiate the `ShuffleManager` this early doesn't work, because user jars have not been localized in all scenarios, and we will fail to load the `ShuffleManager`. We propose moving the `ShuffleManager` instantiation to `SparkContext` on the driver, and Executor, to help with this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org