You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean R. Owen (Jira)" <ji...@apache.org> on 2022/12/04 14:26:00 UTC
[jira] [Resolved] (SPARK-41313) AM shutdown hook fails with IllegalStateException if AM crashes on startup (recurrence of SPARK-3900)
[ https://issues.apache.org/jira/browse/SPARK-41313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen resolved SPARK-41313.
----------------------------------
Fix Version/s: 3.4.0
Resolution: Fixed
Issue resolved by pull request 38832
[https://github.com/apache/spark/pull/38832]
> AM shutdown hook fails with IllegalStateException if AM crashes on startup (recurrence of SPARK-3900)
> -----------------------------------------------------------------------------------------------------
>
> Key: SPARK-41313
> URL: https://issues.apache.org/jira/browse/SPARK-41313
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, YARN
> Affects Versions: 3.4.0
> Reporter: Xing Lin
> Assignee: Xing Lin
> Priority: Minor
> Fix For: 3.4.0
>
>
> SPARK-3900 fixed the {{IllegalStateException}} in cleanupStagingDir in ApplicationMaster's shutdownhook. However, SPARK-21138 accidentally reverted/undid that change when fixing the "Wrong FS" bug. Now, we are seeing SPARK-3900 reported by our users at Linkedin. We need to bring back the fix for SPARK-3900.
> The illegalStateException when creating a new filesystem object is due to the limitation in Hadoop that we can not register a shutdownhook during shutdown. So, when a spark job fails during pre-launch, as part of shutdown, cleanupStagingDir would be called. Then, if we attempt to create a new filesystem object for the first time, HDFS would try to register a hook to shutdown KeyProviderCache when creating a ClientContext for DFSClient. As a result, we hit the {{IllegalStateException}}. We should avoid the creation of a new filesystem object in cleanupStagingDir() when it is called in a shutdown hook. This was introduced in SPARK-3900. However, SPARK-21138 accidentally reverted/undid that change. We need to bring back that fix to Spark to avoid the {{IllegalStateException}}.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org