You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jem Tucker (JIRA)" <ji...@apache.org> on 2015/07/30 15:18:04 UTC

[jira] [Issue Comment Deleted] (SPARK-9377) Shuffle tuning should discuss task size optimisation

     [ https://issues.apache.org/jira/browse/SPARK-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jem Tucker updated SPARK-9377:
------------------------------
    Comment: was deleted

(was: Yes I will do)

> Shuffle tuning should discuss task size optimisation
> ----------------------------------------------------
>
>                 Key: SPARK-9377
>                 URL: https://issues.apache.org/jira/browse/SPARK-9377
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation, Shuffle
>            Reporter: Jem Tucker
>            Priority: Minor
>
> Recent issue SPARK-9310 highlighted the negative effects of having too high parallelism caused by task overhead. Although large task numbers is unavoidable with high volumes of data, more in detail in the documentation will be very beneficial to newcomers when optimising the performance of their applications.
> Areas to discuss could be:
> - What are the overheads of a Spark task? 
> -- Does this overhead chance with task size etc?
> - How to dynamically calculate a suitable parallelism for a Spark job
> - Examples of designing code to minimise shuffles
> -- How to minimise the data volumes when shuffles are required
> - Differences between sort-based and hash-based shuffles
> -- Benefits and weaknesses of each



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org