You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 05:35:33 UTC

[jira] [Updated] (SPARK-2723) Block Manager should catch exceptions in putValues

     [ https://issues.apache.org/jira/browse/SPARK-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-2723:
--------------------------------
    Labels: bulk-closed  (was: )

> Block Manager should catch exceptions in putValues
> --------------------------------------------------
>
>                 Key: SPARK-2723
>                 URL: https://issues.apache.org/jira/browse/SPARK-2723
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Shivaram Venkataraman
>            Priority: Major
>              Labels: bulk-closed
>
> The BlockManager should catch exceptions encountered while writing out files to disk. Right now these exceptions get counted as user-level task failures and the job is aborted after failing 4 times. We should either fail the executor or handle this better to prevent the job from dying.
> I ran into an issue where one disk on a large EC2 cluster failed and this resulted in a long running job terminating. Longer term, we should also look at black-listing local directories when one of them become unusable ?
> Exception pasted below:
> 14/07/29 00:55:39 WARN scheduler.TaskSetManager: Loss was due to java.io.FileNotFoundException
> java.io.FileNotFoundException: /mnt2/spark/spark-local-20140728175256-e7cb/28/broadcast_264_piece20 (Input/output error)
>         at java.io.FileOutputStream.open(Native Method)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:171)
>         at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:79)
>         at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:66)
>         at org.apache.spark.storage.BlockManager.dropFromMemory(BlockManager.scala:847)
>         at org.apache.spark.storage.MemoryStore$$anonfun$ensureFreeSpace$4.apply(MemoryStore.scala:267)
>         at org.apache.spark.storage.MemoryStore$$anonfun$ensureFreeSpace$4.apply(MemoryStore.scala:256)
>         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at org.apache.spark.storage.MemoryStore.ensureFreeSpace(MemoryStore.scala:256)
>         at org.apache.spark.storage.MemoryStore.tryToPut(MemoryStore.scala:179)
>         at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:76)
>         at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:663)
>         at org.apache.spark.storage.BlockManager.put(BlockManager.scala:574)
>         at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:108)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org