You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Davies Liu (JIRA)" <ji...@apache.org> on 2016/08/05 23:21:20 UTC

[jira] [Commented] (SPARK-15354) Topology aware block replication strategies

    [ https://issues.apache.org/jira/browse/SPARK-15354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410269#comment-15410269 ] 

Davies Liu commented on SPARK-15354:
------------------------------------

This strategy used in HDFS is to balance the write traffic (for performance) and durability (or availability) of blocks. But the blocks in Spark is much different, they could be lost and recovered by re-computing, so we usually have only one copy, rarely having two copy. Even with two copies, it's actually to place them randomly to have better balance for computing.

Overall, implementing this strategy for Spark may be not that useful. Could you share more information on your case?

> Topology aware block replication strategies
> -------------------------------------------
>
>                 Key: SPARK-15354
>                 URL: https://issues.apache.org/jira/browse/SPARK-15354
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Mesos, Spark Core, YARN
>            Reporter: Shubham Chopra
>
> Implementations of strategies for resilient block replication for different resource managers that replicate the 3-replica strategy used by HDFS, where the first replica is on an executor, the second replica within the same rack as the executor and a third replica on a different rack. 
> The implementation involves providing two pluggable classes, one running in the driver that provides topology information for every host at cluster start and the second prioritizing a list of peer BlockManagerIds. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org