You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Wang, Ningjun (LNG-NPV)" <ni...@lexisnexis.com> on 2015/03/25 15:58:28 UTC

Total size of serialized results is bigger than spark.driver.maxResultSize

Hi

I ran a spark job and got the following error. Can anybody tell me how to work around this problem? For example how can I increase spark.driver.maxResultSize? Thanks.
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results
of 128 tasks (1029.1 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobA
ndIndependentStages(DAGScheduler.scala:1214)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:12
03)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:12
02)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler
.scala:696)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler
.scala:696)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(D
AGScheduler.scala:1420)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala
:1375)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
       at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
        at akka.dispatch.Mailbox.run(Mailbox.scala:220)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala
:393)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/03/25 10:48:38 WARN TaskSetManager: Lost task 128.0 in stage 199.0 (TID 6324, INT1-CAS01.pcc.lexi
snexis.com): TaskKilled (killed intentionally)

Ningjun


Re: Total size of serialized results is bigger than spark.driver.maxResultSize

Posted by Denny Lee <de...@gmail.com>.
As you noted, you can change the spark.driver.maxResultSize value in your
Spark Configurations (https://spark.apache.org/docs/1.2.0/configuration.html).
Please reference the Spark Properties section noting that you can modify
these properties via the spark-defaults.conf or via SparkConf().

HTH!



On Wed, Mar 25, 2015 at 8:01 AM Wang, Ningjun (LNG-NPV) <
ningjun.wang@lexisnexis.com> wrote:

>  Hi
>
>
>
> I ran a spark job and got the following error. Can anybody tell me how to
> work around this problem? For example how can I increase
> spark.driver.maxResultSize? Thanks.
>
>  org.apache.spark.SparkException: Job aborted due to stage failure: Total
> size of serialized results
>
> of 128 tasks (1029.1 MB) is bigger than spark.driver.maxResultSize (1024.0
> MB)
>
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobA
>
> ndIndependentStages(DAGScheduler.scala:1214)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:12
>
> 03)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:12
>
> 02)
>
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler
>
> .scala:696)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler
>
> .scala:696)
>
>         at scala.Option.foreach(Option.scala:236)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(D
>
> AGScheduler.scala:1420)
>
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala
>
> :1375)
>
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>
>        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>
>         at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala
>
> :393)
>
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> 15/03/25 10:48:38 WARN TaskSetManager: Lost task 128.0 in stage 199.0 (TID
> 6324, INT1-CAS01.pcc.lexi
>
> snexis.com): TaskKilled (killed intentionally)
>
>
>
> Ningjun
>
>
>