You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2017/05/24 15:19:04 UTC

[jira] [Assigned] (SPARK-20250) Improper OOM error when a task been killed while spilling data

     [ https://issues.apache.org/jira/browse/SPARK-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-20250:
------------------------------------

    Assignee:     (was: Apache Spark)

> Improper OOM error when a task been killed while spilling data
> --------------------------------------------------------------
>
>                 Key: SPARK-20250
>                 URL: https://issues.apache.org/jira/browse/SPARK-20250
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1, 1.6.2, 1.6.3, 2.0.0, 2.0.1, 2.0.2, 2.1.0
>            Reporter: Feng Zhu
>
>     When a task is calling spill() but it receives a killing request from driver (e.g., speculative task), the TaskMemoryManager will throw an OOM exception. 
>     Then the executor takes it as UncaughtException, which will be handled by SparkUncaughtExceptionHandler and the executor will consequently be shutdown. 
>     However, this error may lead to the whole application failure due to the "max number of executor failures (30) reached". 
>     In our production environment, we have encountered a lot of such cases. 
> \\
> {noformat}
> 17/04/05 06:41:27 INFO sort.UnsafeExternalSorter: Thread 115 spilling sort data of 928.0 MB to disk (1 time so far)
> 17/04/05 06:41:27 INFO sort.UnsafeSorterSpillWriter: Spill file:/data/usercache/application_1482394966158_87487271/blockmgr-85c25fa8-06b4/32/temp_local_b731
> 17/04/05 06:41:27 INFO sort.UnsafeSorterSpillWriter: Write numRecords:2097152
> 17/04/05 06:41:30 INFO executor.Executor: Executor is trying to kill task 16.0 in stage 3.0 (TID 857)
> 17/04/05 06:41:30 ERROR memory.TaskMemoryManager: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@43a122ed
> java.nio.channels.ClosedByInterruptException
> 	at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> 	at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269)
> 	at org.apache.spark.storage.DiskBlockObjectWriter.updateBytesWritten(DiskBlockObjectWriter.scala:228)
> 	at org.apache.spark.storage.DiskBlockObjectWriter.recordWritten(DiskBlockObjectWriter.scala:207)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.write(UnsafeSorterSpillWriter.java:139)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:196)
> 	at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:170)
> 	at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:244)
> 	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:83)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:302)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:346)
>         ................................................................
> 17/04/05 06:41:30 INFO sort.UnsafeExternalSorter: Thread 115 spilling sort data of 928.0 MB to disk (2  times so far)
> 17/04/05 06:41:30 INFO sort.UnsafeSorterSpillWriter: Spill file:/data/usercache/appcache/application_1482394966158_87487271/blockmgr-573312a3-bd46-4c5c-9293-1021cc34c77
> 17/04/05 06:41:30 INFO sort.UnsafeSorterSpillWriter: Write numRecords:2097152
> 17/04/05 06:41:31 ERROR memory.TaskMemoryManager: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@43a122ed
> java.nio.channels.ClosedByInterruptException
> 	at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> 	at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269)
> 	at org.apache.spark.storage.DiskBlockObjectWriter.updateBytesWritten(DiskBlockObjectWriter.scala:228)
> 	at org.apache.spark.storage.DiskBlockObjectWriter.recordWritten(DiskBlockObjectWriter.scala:207)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.write(UnsafeSorterSpillWriter.java:139)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:196)
> 	at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:170)
> 	at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:244)
> 	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:83)
>         .........................................................................
> 17/04/05 06:41:31 WARN memory.TaskMemoryManager: leak 32.0 KB memory from org.apache.spark.shuffle.sort.ShuffleExternalSorter@513661a6
> 17/04/05 06:41:31 ERROR executor.Executor: Managed memory leak detected; size = 26010016 bytes, TID = 857
> 17/04/05 06:41:31 ERROR executor.Executor: Exception in task 16.0 in stage 3.0 (TID 857)
> java.lang.OutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@43a122ed : null
> 	at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:178)
> 	at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:244)
> 	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:83)
>         ......................................................
> 17/04/05 06:41:31 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
> java.lang.OutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@43a122ed : null
> 	at org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:178)
> 	at org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:244)
> 	at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:83)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.insertRecord(UnsafeInMemorySorter.java:164)
> 	at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:358)
> 	..........................................
> 17/04/05 06:41:31 INFO storage.DiskBlockManager: Shutdown hook called
> 17/04/05 06:41:31 INFO util.ShutdownHookManager: Shutdown hook called
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org