You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Patrick Wendell (JIRA)" <ji...@apache.org> on 2014/08/27 09:18:58 UTC

[jira] [Resolved] (SPARK-3139) Akka timeouts from ContextCleaner when cleaning shuffles

     [ https://issues.apache.org/jira/browse/SPARK-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Wendell resolved SPARK-3139.
------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1.0

Issue resolved by pull request 2143
[https://github.com/apache/spark/pull/2143]

> Akka timeouts from ContextCleaner when cleaning shuffles
> --------------------------------------------------------
>
>                 Key: SPARK-3139
>                 URL: https://issues.apache.org/jira/browse/SPARK-3139
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: 10 r3.2xlarge tests on EC2, running the scala-agg-by-key-int spark-perf test against master commit d7e80c2597d4a9cae2e0cb35a86f7889323f4cbb.
>            Reporter: Josh Rosen
>            Assignee: Guoqiang Li
>            Priority: Blocker
>             Fix For: 1.1.0
>
>
> When running spark-perf tests on EC2, I have a job that's consistently logging the following Akka exceptions:
> {code}
> 4/08/19 22:07:12 ERROR spark.ContextCleaner: Error cleaning shuffle 0
> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>   at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>   at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>   at scala.concurrent.Await$.result(package.scala:107)
>   at org.apache.spark.storage.BlockManagerMaster.removeShuffle(BlockManagerMaster.scala:118)
>   at org.apache.spark.ContextCleaner.doCleanupShuffle(ContextCleaner.scala:159)
>   at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:131)
>   at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:124)
>   at scala.Option.foreach(Option.scala:236)
>   at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:124)
>   at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:120)
>   at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:120)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1252)
>   at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:119)
>   at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
> {code}
> and
> {code}
> 14/08/19 22:07:12 ERROR storage.BlockManagerMaster: Failed to remove shuffle 0
> akka.pattern.AskTimeoutException: Timed out
>   at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)
>   at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118)
>   at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
>   at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
>   at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455)
>   at akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407)
>   at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411)
>   at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This doesn't seem to prevent the job from completing successfully, but it's serious issue because it means that resources aren't being cleaned up.  The test script, ScalaAggByKeyInt, runs each test 10 times, and I see the same error after each test, so this seems deterministically reproducible.
> I'll look at the executor logs to see if I can find more info there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org