You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Patrick Wendell (JIRA)" <ji...@apache.org> on 2015/05/15 21:44:00 UTC

[jira] [Resolved] (SPARK-5920) Use a BufferedInputStream to read local shuffle data

     [ https://issues.apache.org/jira/browse/SPARK-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Wendell resolved SPARK-5920.
------------------------------------
    Resolution: Won't Fix

Per the discussion on this PR I am resolving this as won't fix.

https://github.com/apache/spark/pull/4878

[~kayousterhout] please feel free to re-open if I misinterpreted.

> Use a BufferedInputStream to read local shuffle data
> ----------------------------------------------------
>
>                 Key: SPARK-5920
>                 URL: https://issues.apache.org/jira/browse/SPARK-5920
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 1.2.1, 1.3.0
>            Reporter: Kay Ousterhout
>            Assignee: Kay Ousterhout
>            Priority: Blocker
>
> When reading local shuffle data, Spark doesn't currently buffer the local reads into larger chunks, which can lead to terrible disk performance if many tasks are concurrently reading local data from the same disk.  We should use a BufferedInputStream to mitigate this problem; we can lazily create the input stream to avoid allocating a bunch of in-memory buffers at the same time for tasks that read shuffle data from a large number of local blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org