You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by mr...@apache.org on 2020/12/09 07:24:36 UTC

[spark] branch branch-3.1 updated: [SPARK-33669] Wrong error message from YARN application state monitor when sc.stop in yarn client mode

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
     new 57ed4df  [SPARK-33669] Wrong error message from YARN application state monitor when sc.stop in yarn client mode
57ed4df is described below

commit 57ed4df53e3dba01a20ab38ba8098b48594054fd
Author: suqilong <su...@qiyi.com>
AuthorDate: Wed Dec 9 01:21:13 2020 -0600

    [SPARK-33669] Wrong error message from YARN application state monitor when sc.stop in yarn client mode
    
    ### What changes were proposed in this pull request?
    This change make InterruptedIOException to be treated as InterruptedException when closing YarnClientSchedulerBackend, which doesn't log error with "YARN application has exited unexpectedly xxx"
    
    ### Why are the changes needed?
    For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries to interrupt Yarn application monitor thread. In MonitorThread.run() it catches InterruptedException to gracefully response to stopping request.
    
    But client.monitorApplication method also throws InterruptedIOException when the hadoop rpc call is calling. In this case, MonitorThread will not know it is interrupted, a Yarn App failed is returned with "Failed to contact YARN for application xxxxx;  YARN application has exited unexpectedly with state xxxxx" is logged with error level. which confuse user a lot.
    
    ### Does this PR introduce _any_ user-facing change?
    Yes
    
    ### How was this patch tested?
    very simple patch, seems no need?
    
    Closes #30617 from sqlwindspeaker/yarn-client-interrupt-monitor.
    
    Authored-by: suqilong <su...@qiyi.com>
    Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
    (cherry picked from commit 48f93af9f3d40de5bf087eb1a06c1b9954b2ad76)
    Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
---
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala    | 2 +-
 .../apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala  | 5 ++++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 7f791e0..618faef 100644
--- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -1069,7 +1069,7 @@ private[spark] class Client(
             logError(s"Application $appId not found.")
             cleanupStagingDir()
             return YarnAppReport(YarnApplicationState.KILLED, FinalApplicationStatus.KILLED, None)
-          case NonFatal(e) =>
+          case NonFatal(e) if !e.isInstanceOf[InterruptedIOException] =>
             val msg = s"Failed to contact YARN for application $appId."
             logError(msg, e)
             // Don't necessarily clean up staging dir because status is unknown
diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
index cb0de5a..8a55e61 100644
--- a/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
+++ b/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.scheduler.cluster
 
+import java.io.InterruptedIOException
+
 import scala.collection.mutable.ArrayBuffer
 
 import org.apache.hadoop.yarn.api.records.YarnApplicationState
@@ -121,7 +123,8 @@ private[spark] class YarnClientSchedulerBackend(
         allowInterrupt = false
         sc.stop()
       } catch {
-        case e: InterruptedException => logInfo("Interrupting monitor thread")
+        case _: InterruptedException | _: InterruptedIOException =>
+          logInfo("Interrupting monitor thread")
       }
     }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org