You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/02/03 00:10:00 UTC

[jira] [Commented] (SPARK-21113) Support for read ahead input stream to amortize disk IO cost in the Spill reader

    [ https://issues.apache.org/jira/browse/SPARK-21113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351096#comment-16351096 ] 

Apache Spark commented on SPARK-21113:
--------------------------------------

User 'sitalkedia' has created a pull request for this issue:
https://github.com/apache/spark/pull/20492

> Support for read ahead input stream to amortize disk IO cost in the Spill reader
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-21113
>                 URL: https://issues.apache.org/jira/browse/SPARK-21113
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.0.2
>            Reporter: Sital Kedia
>            Assignee: Sital Kedia
>            Priority: Minor
>             Fix For: 2.3.0
>
>
> Profiling some of our big jobs, we see that around 30% of the time is being spent in reading the spill files from disk. In order to amortize the disk IO cost, the idea is to implement a read ahead input stream which which asynchronously reads ahead from the underlying input stream when specified amount of data has been read from the current buffer. It does it by maintaining two buffer - active buffer and read ahead buffer. Active buffer contains data which should be returned when a read() call is issued. The read ahead buffer is used to asynchronously read from the underlying input stream and once the current active buffer is exhausted, we flip the two buffers so that we can start reading from the read ahead buffer without being blocked in disk I/O.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org