You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2015/04/10 02:57:15 UTC

[jira] [Created] (SPARK-6839) DeserializationStream.asIterator leaks resources on user exceptions

Imran Rashid created SPARK-6839:
-----------------------------------

             Summary: DeserializationStream.asIterator leaks resources on user exceptions
                 Key: SPARK-6839
                 URL: https://issues.apache.org/jira/browse/SPARK-6839
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
            Reporter: Imran Rashid


>From a discussion with [~vanzin] on {{ByteBufferInputStream}}, we realized that [{{BlockManager.dataDeserialize}}|https://github.com/apache/spark/blob/b5c51c8df480f1a82a82e4d597d8eea631bffb4e/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1202] doesn't  guarantee the underlying InputStream is properly closed.  In particular, {{BlockManager.dispose(byteBuffer)}} will not get called any time there is an exception in user code.

The problem is that right now, we convert the input streams to iterators, and only close the input stream if the end of the iterator is reached.  But, we might never reach the end of the iterator -- the obvious case is if there is a bug in the user code, so tasks fail part of the way through the iterator.

I think the solution is to give {{BlockManager.dataDeserialize}} a {{TaskContext}} so it can call {{context.addTaskCompletionListener}} to do the cleanup (as is done in {{ShuffleBlockFetcherIterator}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org