You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yong Zhang (JIRA)" <ji...@apache.org> on 2016/03/20 15:44:33 UTC

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

    [ https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203315#comment-15203315 ] 

Yong Zhang commented on SPARK-10309:
------------------------------------

On Spark 1.5.2, We also face this issue when "braodcast" join being used in the DateFrame.

Why this fix is not merged into Spark 1.5.x release? On our case, the job fails eventually, so I have to disable tungsten by "spark.sql.tungsten.enabled=false"

> Some tasks failed with Unable to acquire memory
> -----------------------------------------------
>
>                 Key: SPARK-10309
>                 URL: https://issues.apache.org/jira/browse/SPARK-10309
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Davies Liu
>            Assignee: Davies Liu
>
> *=== Update ===*
> This is caused by a mismatch between `Runtime.getRuntime.availableProcessors()` and the number of active tasks in `ShuffleMemoryManager`. A quick reproduction is the following:
> {code}
> // My machine only has 8 cores
> $ bin/spark-shell --master local[32]
> scala> val df = sc.parallelize(Seq((1, 1), (2, 2))).toDF("a", "b")
> scala> df.as("x").join(df.as("y"), $"x.a" === $"y.a").count()
> Caused by: java.io.IOException: Unable to acquire 2097152 bytes of memory
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:351)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:138)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
> 	at org.apache.spark.sql.execution.UnsafeExternalRowSorter.<init>(UnsafeExternalRowSorter.java:68)
> 	at org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:120)
> 	at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$2.apply(sort.scala:143)
> 	at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$2.apply(sort.scala:143)
> 	at org.apache.spark.rdd.MapPartitionsWithPreparationRDD.prepare(MapPartitionsWithPreparationRDD.scala:50)
> {code}
> *=== Original ===*
> While running Q53 of TPCDS (scale = 1500) on 24 nodes cluster (12G memory on executor):
> {code}
> java.io.IOException: Unable to acquire 33554432 bytes of memory
>         at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
>         at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:138)
>         at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:106)
>         at org.apache.spark.sql.execution.UnsafeExternalRowSorter.<init>(UnsafeExternalRowSorter.java:68)
>         at org.apache.spark.sql.execution.TungstenSort.org$apache$spark$sql$execution$TungstenSort$$preparePartition$1(sort.scala:146)
>         at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
>         at org.apache.spark.sql.execution.TungstenSort$$anonfun$doExecute$3.apply(sort.scala:169)
>         at org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:45)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>         at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> The task could finished after retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org