You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Danula Eranjith (JIRA)" <ji...@apache.org> on 2017/11/06 03:41:01 UTC

[jira] [Commented] (SPARK-14228) Lost executor of RPC disassociated, and occurs exception: Could not find CoarseGrainedScheduler or it has been stopped

    [ https://issues.apache.org/jira/browse/SPARK-14228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239854#comment-16239854 ] 

Danula Eranjith commented on SPARK-14228:
-----------------------------------------

I encountered the same issue in 1.6.3

{code}
17/11/03 22:38:33 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Executor for container container_e02_1509517131757_0001_01_000003 exited because of a YARN event (e.g., pre-emption) and not because of an error in the running job.
17/11/03 22:38:33 ERROR YarnClientSchedulerBackend: Could not find CoarseGrainedScheduler or it has been stopped.
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped.
        at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
        at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:128)
        at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:231)
        at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:515)
        at org.apache.spark.rpc.RpcEndpointRef.ask(RpcEndpointRef.scala:62)
        at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:392)
        at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1.applyOrElse(YarnSchedulerBackend.scala:259)
        at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
17/11/03 22:38:33 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Executor for container container_e02_1509517131757_0001_01_000002 exited because of a YARN event (e.g., pre-emption) and not because of an error in the running job.
17/11/03 22:38:33 ERROR YarnClientSchedulerBackend: Could not find CoarseGrainedScheduler or it has been stopped.
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped.
        at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
        at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:128)
        at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:231)
        at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:515)
        at org.apache.spark.rpc.RpcEndpointRef.ask(RpcEndpointRef.scala:62)
        at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:392)
        at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1.applyOrElse(YarnSchedulerBackend.scala:259)
        at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}


> Lost executor of RPC disassociated, and occurs exception: Could not find CoarseGrainedScheduler or it has been stopped
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-14228
>                 URL: https://issues.apache.org/jira/browse/SPARK-14228
>             Project: Spark
>          Issue Type: Bug
>            Reporter: meiyoula
>
> When I start 1000 executors, and then stop the process. It will call SparkContext.stop to stop all executors. But during this process, the executors has been killed will lost of rpc with driver, and try to reviveOffers, but can't find CoarseGrainedScheduler or it has been stopped.
> {quote}
> 16/03/29 01:45:45 ERROR YarnScheduler: Lost executor 610 on 51-196-152-8: remote Rpc client disassociated
> 16/03/29 01:45:45 ERROR Inbox: Ignoring error
> org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped.
> 	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161)
> 	at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131)
> 	at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:173)
> 	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:398)
> 	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:314)
> 	at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:482)
> 	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.removeExecutor(CoarseGrainedSchedulerBackend.scala:261)
> 	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$onDisconnected$1.apply(CoarseGrainedSchedulerBackend.scala:207)
> 	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$onDisconnected$1.apply(CoarseGrainedSchedulerBackend.scala:207)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.onDisconnected(CoarseGrainedSchedulerBackend.scala:207)
> 	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:144)
> 	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
> 	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
> 	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org