You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:12:45 UTC

[jira] [Resolved] (SPARK-21867) Support async spilling in UnsafeShuffleWriter

     [ https://issues.apache.org/jira/browse/SPARK-21867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-21867.
----------------------------------
    Resolution: Incomplete

> Support async spilling in UnsafeShuffleWriter
> ---------------------------------------------
>
>                 Key: SPARK-21867
>                 URL: https://issues.apache.org/jira/browse/SPARK-21867
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Sital Kedia
>            Priority: Minor
>              Labels: bulk-closed
>         Attachments: Async ShuffleExternalSorter.pdf
>
>
> Currently, Spark tasks are single-threaded. But we see it could greatly improve the performance of the jobs, if we can multi-thread some part of it. For example, profiling our map tasks, which reads large amount of data from HDFS and spill to disks, we see that we are blocked on HDFS read and spilling majority of the time. Since both these operations are IO intensive the average CPU consumption during map phase is significantly low. In theory, both HDFS read and spilling can be done in parallel if we had additional memory to store data read from HDFS while we are spilling the last batch read.
> Let's say we have 1G of shuffle memory available per task. Currently, in case of map task, it reads from HDFS and the records are stored in the available memory buffer. Once we hit the memory limit and there is no more space to store the records, we sort and spill the content to disk. While we are spilling to disk, since we do not have any available memory, we can not read from HDFS concurrently. 
> Here we propose supporting async spilling for UnsafeShuffleWriter, so that we can support reading from HDFS when sort and spill is happening asynchronously.  Let's say the total 1G of shuffle memory can be split into two regions - active region and spilling region - each of size 500 MB. We start with reading from HDFS and filling the active region. Once we hit the limit of active region, we issue an asynchronous spill, while fliping the active region and spilling region. While the spil is happening asynchronosuly, we still have 500 MB of memory available to read the data from HDFS. This way we can amortize the high disk/network io cost during spilling.
> We made a prototype hack to implement this feature and we could see our map tasks were as much as 40% faster. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org