You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sital Kedia (JIRA)" <ji...@apache.org> on 2016/08/17 19:37:20 UTC

[jira] [Commented] (SPARK-17113) Job failure due to Executor OOM

    [ https://issues.apache.org/jira/browse/SPARK-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425216#comment-15425216 ] 

Sital Kedia commented on SPARK-17113:
-------------------------------------

cc - [~davies] 

> Job failure due to Executor OOM
> -------------------------------
>
>                 Key: SPARK-17113
>                 URL: https://issues.apache.org/jira/browse/SPARK-17113
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.2, 2.0.0
>            Reporter: Sital Kedia
>
> We have been seeing many job failure due to executor OOM with following stack trace - 
> {code}
> java.lang.OutOfMemoryError: Unable to acquire 1220 bytes of memory, got 0
> 	at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:341)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:362)
> 	at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:93)
> 	at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:170)
> 	at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
> 	at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
> 	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:736)
> 	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:736)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:307)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:271)
> 	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:307)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:271)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:307)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:271)
> 	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> 	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:307)
> 	at org.apache.spark.rdd.RDD.iterator(RDD.scala:271)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:89)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> Digging into the code, we found out that this is an issue with cooperative memory management for off heap memory allocation.  
> In the code https://github.com/sitalkedia/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L463, when the UnsafeExternalSorter is checking if memory page is being used by upstream, the base object in case of off heap memory is always null so the UnsafeExternalSorter does not spill the memory pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org