You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/12/05 10:11:00 UTC
[jira] [Commented] (SPARK-33669) Wrong error message from YARN
application state monitor when sc.stop in yarn client mode
[ https://issues.apache.org/jira/browse/SPARK-33669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244455#comment-17244455 ]
Apache Spark commented on SPARK-33669:
--------------------------------------
User 'sqlwindspeaker' has created a pull request for this issue:
https://github.com/apache/spark/pull/30617
> Wrong error message from YARN application state monitor when sc.stop in yarn client mode
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-33669
> URL: https://issues.apache.org/jira/browse/SPARK-33669
> Project: Spark
> Issue Type: Bug
> Components: YARN
> Affects Versions: 2.4.3, 3.0.1
> Reporter: Su Qilong
> Priority: Minor
>
> For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries to interrupt Yarn application monitor thread. In MonitorThread.run() it catches InterruptedException to gracefully response to stopping request.
> But client.monitorApplication method also throws InterruptedIOException when the hadoop rpc call is calling. In this case, MonitorThread will not know it is interrupted, a Yarn App failed is returned with "Failed to contact YARN for application xxxxx; YARN application has exited unexpectedly with state xxxxx" is logged with error level. which confuse user a lot.
> We Should take considerate InterruptedIOException here to make it the same behavior with InterruptedException.
> {code:java}
> private class MonitorThread extends Thread {
> private var allowInterrupt = true
> override def run() {
> try {
> val YarnAppReport(_, state, diags) =
> client.monitorApplication(appId.get, logApplicationReport = false)
> logError(s"YARN application has exited unexpectedly with state $state! " +
> "Check the YARN application logs for more details.")
> diags.foreach { err =>
> logError(s"Diagnostics message: $err")
> }
> allowInterrupt = false
> sc.stop()
> } catch {
> case e: InterruptedException => logInfo("Interrupting monitor thread")
> }
> }
>
> {code}
> {code:java}
> // wrong error message
> 2020-12-05 03:06:58,000 ERROR [YARN application state monitor]: org.apache.spark.deploy.yarn.Client(91) - Failed to contact YARN for application application_1605868815011_1154961.
> java.io.InterruptedIOException: Call interrupted
> at org.apache.hadoop.ipc.Client.call(Client.java:1466)
> at org.apache.hadoop.ipc.Client.call(Client.java:1409)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy38.getApplicationReport(Unknown Source)
> at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:187)
> at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy39.getApplicationReport(Unknown Source)
> at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:408)
> at org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:327)
> at org.apache.spark.deploy.yarn.Client.monitorApplication(Client.scala:1039)
> at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:116)
> 2020-12-05 03:06:58,000 ERROR [YARN application state monitor]: org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend(70) - YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
> 2020-12-05 03:06:58,001 ERROR [YARN application state monitor]: org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend(70) - Diagnostics message: Failed to contact YARN for application application_1605868815011_1154961.
> {code}
>
> {code:java}
> // hadoop ipc code
> public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
> ConnectionId remoteId, int serviceClass,
> AtomicBoolean fallbackToSimpleAuth) throws IOException {
> final Call call = createCall(rpcKind, rpcRequest);
> Connection connection = getConnection(remoteId, call, serviceClass,
> fallbackToSimpleAuth);
> try {
> connection.sendRpcRequest(call); // send the rpc request
> } catch (RejectedExecutionException e) {
> throw new IOException("connection has been closed", e);
> } catch (InterruptedException e) {
> Thread.currentThread().interrupt();
> LOG.warn("interrupted waiting to send rpc request to server", e);
> throw new IOException(e);
> }
> synchronized (call) {
> while (!call.done) {
> try {
> call.wait(); // wait for the result
> } catch (InterruptedException ie) {
> Thread.currentThread().interrupt();
> throw new InterruptedIOException("Call interrupted");
> }
> }
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org