You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2015/05/15 20:27:02 UTC

[jira] [Commented] (SPARK-7655) Akka timeout exception

    [ https://issues.apache.org/jira/browse/SPARK-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545936#comment-14545936 ] 

Shixiong Zhu commented on SPARK-7655:
-------------------------------------

Found the following stack track that a thread holding the TaskSchedulerImpl lock may block Akka threads:

{code}
"task-result-getter-1" daemon prio=10 tid=0x00007f460400b000 nid=0x3b73 runnable [0x00007f45a70ee000]

   java.lang.Thread.State: RUNNABLE

        at org.apache.spark.util.ByteBufferInputStream.read(ByteBufferInputStream.scala:38)

        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2293)

        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2586)

        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)

        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1505)

        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)

        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)

        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)

        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)

        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)

        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)

        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)

        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)

        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:89)

        at org.apache.spark.scheduler.DirectTaskResult.value(TaskResult.scala:79)

        at org.apache.spark.scheduler.TaskSetManager.handleSuccessfulTask(TaskSetManager.scala:624)

        at org.apache.spark.scheduler.TaskSchedulerImpl.handleSuccessfulTask(TaskSchedulerImpl.scala:378)

        - locked <0x0000000270cddd18> (a org.apache.spark.scheduler.TaskSchedulerImpl)

        at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)

        at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)

        at org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:51)

        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1769)

        at org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:50)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)
{code}

And two Akka threads are waiting for the TaskSchedulerImpl lock.

We should not call {{DirectTaskResult.value}} when holding the TaskSchedulerImpl lock. It may cost dozens of seconds to deserialize a large object.

I will move it out of the lock.

Thank [~yhuai] for the stack track.

> Akka timeout exception
> ----------------------
>
>                 Key: SPARK-7655
>                 URL: https://issues.apache.org/jira/browse/SPARK-7655
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.4.0
>            Reporter: Yin Huai
>            Assignee: Shixiong Zhu
>            Priority: Blocker
>
> I got the following exception when I was running a query with broadcast join.
> {code}
> 15/05/15 01:15:49 [WARN] AkkaRpcEndpointRef: Error sending message [message = UpdateBlockInfo(BlockManagerId(driver, 10.0.171.162, 54870),broadcast_758_piece0,StorageLevel(false, false, false, false, 1),0,0,0)] in 1 attempts
> java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
> 	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> 	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> 	at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
> 	at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> 	at scala.concurrent.Await$.result(package.scala:107)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
> 	at org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:58)
> 	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:374)
> 	at org.apache.spark.storage.BlockManager.reportBlockStatus(BlockManager.scala:350)
> 	at org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1107)
> 	at org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1083)
> 	at org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1083)
> 	at scala.collection.immutable.Set$Set2.foreach(Set.scala:94)
> 	at org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1083)
> 	at org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:65)
> 	at org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> 	at org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:65)
> 	at org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:78)
> 	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> 	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org