You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "jin xing (JIRA)" <ji...@apache.org> on 2017/03/02 10:21:45 UTC

[jira] [Comment Edited] (SPARK-19659) Fetch big blocks to disk when shuffle-read

    [ https://issues.apache.org/jira/browse/SPARK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884789#comment-15884789 ] 

jin xing edited comment on SPARK-19659 at 3/2/17 10:21 AM:
-----------------------------------------------------------

[~irashid]

I've uploaded a design doc, please take a look and give your comments when you have time : )

Thanks a lot for your previous comments. They are very helpful. It's great if you can help comment more and I can continue working on this :)


was (Author: jinxing6042@126.com):
[~irashid]

I've uploaded a design doc, please take a look and give your comments when you have time : )

> Fetch big blocks to disk when shuffle-read
> ------------------------------------------
>
>                 Key: SPARK-19659
>                 URL: https://issues.apache.org/jira/browse/SPARK-19659
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>         Attachments: SPARK-19659-design-v1.pdf
>
>
> Currently the whole block is fetched into memory(offheap by default) when shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can be large when skew situations. If OOM happens during shuffle read, job will be killed and users will be notified to "Consider boosting spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more memory can resolve the OOM. However the approach is not perfectly suitable for production environment, especially for data warehouse.
> Using Spark SQL as data engine in warehouse, users hope to have a unified parameter(e.g. memory) but less resource wasted(resource is allocated but not used),
> It's not always easy to predict skew situations, when happen, it make sense to fetch remote blocks to disk for shuffle-read, rather than
> kill the job because of OOM. This approach is mentioned during the discussion in SPARK-3019, by [~sandyr] and [~mridulm80]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org