You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2015/04/10 02:57:15 UTC
[jira] [Created] (SPARK-6839) DeserializationStream.asIterator
leaks resources on user exceptions
Imran Rashid created SPARK-6839:
-----------------------------------
Summary: DeserializationStream.asIterator leaks resources on user exceptions
Key: SPARK-6839
URL: https://issues.apache.org/jira/browse/SPARK-6839
Project: Spark
Issue Type: Bug
Components: Spark Core
Reporter: Imran Rashid
>From a discussion with [~vanzin] on {{ByteBufferInputStream}}, we realized that [{{BlockManager.dataDeserialize}}|https://github.com/apache/spark/blob/b5c51c8df480f1a82a82e4d597d8eea631bffb4e/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1202] doesn't guarantee the underlying InputStream is properly closed. In particular, {{BlockManager.dispose(byteBuffer)}} will not get called any time there is an exception in user code.
The problem is that right now, we convert the input streams to iterators, and only close the input stream if the end of the iterator is reached. But, we might never reach the end of the iterator -- the obvious case is if there is a bug in the user code, so tasks fail part of the way through the iterator.
I think the solution is to give {{BlockManager.dataDeserialize}} a {{TaskContext}} so it can call {{context.addTaskCompletionListener}} to do the cleanup (as is done in {{ShuffleBlockFetcherIterator}}).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org