You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/11 17:14:35 UTC

[GitHub] [spark] pgandhi999 edited a comment on issue #24035: [SPARK-27112] : Spark Scheduler encounters two independent Deadlocks …

pgandhi999 edited a comment on issue #24035: [SPARK-27112] : Spark Scheduler encounters two independent Deadlocks …
URL: https://github.com/apache/spark/pull/24035#issuecomment-471633756
 
 
   @attilapiros Have attached the stack trace in text format here:
   
   Deadlock between task-result-getter-thread and spark-dynamic-executor-allocation thread:
   
   ```
   =============================
   "task-result-getter-0":
     waiting to lock monitor 0x00007f35dcf25cb8 (object 0x00000004404f2518, a org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend),
     which is held by "spark-dynamic-executor-allocation"
   "spark-dynamic-executor-allocation":
     waiting to lock monitor 0x00007f35dc20f1f8 (object 0x00000004404f25c0, a org.apache.spark.scheduler.cluster.YarnClusterScheduler),
     which is held by "task-result-getter-0"
   
   
   Java stack information for the threads listed above:
   ===================================================
   "task-result-getter-0":
           at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.killExecutors(CoarseGrainedSchedulerBackend.scala:603)
           - waiting to lock <0x00000004404f2518> (a org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend)
           at org.apache.spark.scheduler.BlacklistTracker.org$apache$spark$scheduler$BlacklistTracker$$killBlacklistedExecutor(BlacklistTracker.scala:155)
           at org.apache.spark.scheduler.BlacklistTracker$$anonfun$updateBlacklistForSuccessfulTaskSet$1.apply(BlacklistTracker.scala:247)
           at org.apache.spark.scheduler.BlacklistTracker$$anonfun$updateBlacklistForSuccessfulTaskSet$1.apply(BlacklistTracker.scala:226)
           at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
           at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
           at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
           at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
           at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
           at org.apache.spark.scheduler.BlacklistTracker.updateBlacklistForSuccessfulTaskSet(BlacklistTracker.scala:226)
           at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet$1.apply(TaskSetManager.scala:530)
           at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet$1.apply(TaskSetManager.scala:530)
           at scala.Option.foreach(Option.scala:257)
           at org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet(TaskSetManager.scala:530)
           at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:787)
           at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:466)
           - locked <0x00000004404f25c0> (a org.apache.spark.scheduler.cluster.YarnClusterScheduler)
           at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:113)
           at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
           at org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
           at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2004)
           at org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   
   
   "spark-dynamic-executor-allocation":
           at org.apache.spark.scheduler.TaskSchedulerImpl.isExecutorBusy(TaskSchedulerImpl.scala:647)
           - waiting to lock <0x00000004404f25c0> (a org.apache.spark.scheduler.cluster.YarnClusterScheduler)
           at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$$anonfun$9.apply(CoarseGrainedSchedulerBackend.scala:613)
           at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$$anonfun$9.apply(CoarseGrainedSchedulerBackend.scala:613)
           at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
           at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
           at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
           at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
           at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
           at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.killExecutors(CoarseGrainedSchedulerBackend.scala:613)
          - locked <0x00000004404f2518> (a org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend)
           at org.apache.spark.ExecutorAllocationManager.removeExecutors(ExecutorAllocationManager.scala:481)
   
           - locked <0x00000004442fb590> (a org.apache.spark.ExecutorAllocationManager)
           at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:321)
           - locked <0x00000004442fb590> (a org.apache.spark.ExecutorAllocationManager)
           at org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:246)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
           at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   ```
   
   Deadlock between task-result-getter-thread and dispatcher-event-loop thread:
   
   ```
   Found one Java-level deadlock:
   =============================
   "task-result-getter-2":
     waiting to lock monitor 0x00007f9be88b2678 (object 0x00000003c0720ed0, a org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend),
     which is held by "dispatcher-event-loop-23"
   "dispatcher-event-loop-23":
     waiting to lock monitor 0x00007f9bf077abb8 (object 0x00000003c0720f78, a org.apache.spark.scheduler.cluster.YarnClusterScheduler),
     which is held by "task-result-getter-2"
   
   Java stack information for the threads listed above:
   ===================================================
   "task-result-getter-2":
   	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.killExecutors(CoarseGrainedSchedulerBackend.scala:604)
   	- waiting to lock <0x00000003c0720ed0> (a org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend)
   	at org.apache.spark.scheduler.BlacklistTracker.killExecutor(BlacklistTracker.scala:153)
   	at org.apache.spark.scheduler.BlacklistTracker.org$apache$spark$scheduler$BlacklistTracker$$killBlacklistedExecutor(BlacklistTracker.scala:163)
   	at org.apache.spark.scheduler.BlacklistTracker$$anonfun$updateBlacklistForSuccessfulTaskSet$1.apply(BlacklistTracker.scala:257)
   	at org.apache.spark.scheduler.BlacklistTracker$$anonfun$updateBlacklistForSuccessfulTaskSet$1.apply(BlacklistTracker.scala:236)
   	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
   	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
   	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
   	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
   	at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
   	at org.apache.spark.scheduler.BlacklistTracker.updateBlacklistForSuccessfulTaskSet(BlacklistTracker.scala:236)
   	at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet$1.apply(TaskSetManager.scala:530)
   	at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet$1.apply(TaskSetManager.scala:530)
   	at scala.Option.foreach(Option.scala:257)
   	at org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$maybeFinishTaskSet(TaskSetManager.scala:530)
   	at org.apache.spark.scheduler.TaskSetManager.handleFailedTask(TaskSetManager.scala:916)
   	at org.apache.spark.scheduler.TaskSchedulerImpl.handleFailedTask(TaskSchedulerImpl.scala:539)
   	- locked <0x00000003c0720f78> (a org.apache.spark.scheduler.cluster.YarnClusterScheduler)
   	at org.apache.spark.scheduler.TaskResultGetter$$anon$4$$anonfun$run$2.apply$mcV$sp(TaskResultGetter.scala:150)
   	at org.apache.spark.scheduler.TaskResultGetter$$anon$4$$anonfun$run$2.apply(TaskResultGetter.scala:132)
   	at org.apache.spark.scheduler.TaskResultGetter$$anon$4$$anonfun$run$2.apply(TaskResultGetter.scala:132)
   	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2005)
   	at org.apache.spark.scheduler.TaskResultGetter$$anon$4.run(TaskResultGetter.scala:132)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   "dispatcher-event-loop-23":
   	at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:321)
   	- waiting to lock <0x00000003c0720f78> (a org.apache.spark.scheduler.cluster.YarnClusterScheduler)
   	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.org$apache$spark$scheduler$cluster$CoarseGrainedSchedulerBackend$DriverEndpoint$$makeOffers(CoarseGrainedSchedulerBackend.scala:248)
   	- locked <0x00000003c0720ed0> (a org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend)
   	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:136)
   	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
   	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
   	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
   	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org