You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Weizhong (JIRA)" <ji...@apache.org> on 2017/01/06 02:42:58 UTC

[jira] [Commented] (SPARK-16180) Task hang on fetching blocks (cached RDD)

    [ https://issues.apache.org/jira/browse/SPARK-16180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803301#comment-15803301 ] 

Weizhong commented on SPARK-16180:
----------------------------------

Hi, we also meet this issue on Spark 1.6. From the executor log, we found the thread hang on 3h, and then task succeed.
{noformat}
2017-01-04 21:07:21,675 | INFO  | [Executor task launch worker-0] | Running task 447.0 in stage 22.0 (TID 22335) | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2017-01-04 21:07:21,883 | INFO  | [Executor task launch worker-0] | Found block rdd_31_447 remotely | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2017-01-04 21:07:22,091 | INFO  | [Executor task launch worker-4] | Finished task 1866.0 in stage 18.0 (TID 21754). 106402 bytes result sent to driver executor run time: 27585 ms | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2017-01-04 21:07:22,197 | INFO  | [Executor task launch worker-1] | Found block rdd_31_424 remotely | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2017-01-04 21:07:22,201 | INFO  | [dispatcher-event-loop-18] | Got assigned task 22354 | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2017-01-04 21:07:22,202 | INFO  | [Executor task launch worker-4] | Running task 466.0 in stage 22.0 (TID 22354) | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2017-01-04 21:07:22,948 | INFO  | [Executor task launch worker-4] | Found block rdd_31_466 remotely | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2017-01-05 00:40:25,638 | INFO  | [Executor task launch worker-2] | Finished task 227.0 in stage 22.0 (TID 22115). 4961 bytes result sent to driver executor run time: 12787090 ms | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2017-01-05 00:40:26,948 | INFO  | [Executor task launch worker-1] | Finished task 424.0 in stage 22.0 (TID 22312). 4961 bytes result sent to driver executor run time: 12785601 ms | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2017-01-05 00:40:27,492 | INFO  | [Executor task launch worker-0] | Finished task 447.0 in stage 22.0 (TID 22335). 4961 bytes result sent to driver executor run time: 12785815 ms | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
2017-01-05 00:40:27,561 | INFO  | [Executor task launch worker-4] | Finished task 466.0 in stage 22.0 (TID 22354). 4961 bytes result sent to driver executor run time: 12785356 ms | org.apache.spark.Logging$class.logInfo(Logging.scala:59)
{noformat}
Do you have found the root reason?

> Task hang on fetching blocks (cached RDD)
> -----------------------------------------
>
>                 Key: SPARK-16180
>                 URL: https://issues.apache.org/jira/browse/SPARK-16180
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 1.6.1
>            Reporter: Davies Liu
>
> Here is the stackdump of executor:
> {code}
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> scala.concurrent.Await$.result(package.scala:107)
> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:102)
> org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:588)
> org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:585)
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:585)
> org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:570)
> org.apache.spark.storage.BlockManager.get(BlockManager.scala:630)
> org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:46)
> org.apache.spark.scheduler.Task.run(Task.scala:96)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:222)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org