You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2019/02/12 21:21:00 UTC

[jira] [Resolved] (SPARK-22760) where driver is stopping, and some executors lost because of YarnSchedulerBackend.stop, then there is a problem.

     [ https://issues.apache.org/jira/browse/SPARK-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo Vanzin resolved SPARK-22760.
------------------------------------
    Resolution: Won't Fix

See discussion in PR. It's just a misleading exception. Not worth the cost of fixing.

> where driver is stopping, and some executors lost because of YarnSchedulerBackend.stop, then there is a problem. 
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22760
>                 URL: https://issues.apache.org/jira/browse/SPARK-22760
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 2.2.1
>            Reporter: KaiXinXIaoLei
>            Priority: Major
>         Attachments: 微信图片_20171212094100.jpg
>
>
> Using   SPARK-14228 , i still find a problem:
> {noformat}
> 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to shut down
> 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63.
> 17/12/12 15:34:45 ERROR Inbox: Ignoring error
> org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped.
> 	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
> 	at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133)
> 	at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
> 	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
> 	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356)
> 	at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497)
> 	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301)
> 	at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121)
> 	at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120)
> 	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142)
> 	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
> 	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> 	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> and sometimes,  the below problem is also exists:
> {noformat}
> 17/12/11 15:50:53 INFO YarnClientSchedulerBackend: Stopped
> 17/12/11 15:50:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
> 17/12/11 15:50:53 ERROR Inbox: Ignoring error
> org.apache.spark.SparkException: Unsupported message OneWayMessage(101.8.73.53:42930,RemoveExecutor(68,Executor for container container_e05_1512975871311_0007_01_000069 exited because of a YARN event (e.g., pre-emption) and not because of an error in the running job.)) from 101.8.73.53:42930
>         at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:118)
>         at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:117)
>         at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:126)
>         at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
>         at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
>         at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154)
>         at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
>         at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:186)
>         at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:512)
>         at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255)
>         at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255)
>         at scala.util.Success.foreach(Try.scala:236)
>         at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
>         at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
> {noformat}
> I analysis this reason. When the number of executors is big, and YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is running, some executor is stopped, and YarnSchedulerBackend.onDisconnected() will be called, then the problem is exists



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org