You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2016/11/04 19:06:58 UTC
[jira] [Commented] (TEZ-2442) Support DFS based shuffle in addition to HTTP shuffle

    [ https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637351#comment-15637351 ] 

Ming Ma commented on TEZ-2442:
------------------------------

Nice work.

bq. The doc mentioned "For skewed intermediate output, reducers can start early instead of waiting for downloading the entire output"
Is it similar to slow start in the existing shuffle?

bq. In reality we can put intermediate data in an HDFS which replication factor of one, and the error heuristic should be exactly the same as "persisted” edge
This depends on how you set up the cluster, if compute clusters are separated from HDFS clusters, then failed mapper node doesn’t necessarily requires rerun of mappers.

bq. Cleanup of intermediate shuffle files
What will happen if app master is stopped via kill -9 without giving it a chance to clean up? In the current shuffle, YARN node manager takes care of that as it knows the app is gone and can clean up the app’s directory that stores the local shuffle data.

> Support DFS based shuffle in addition to HTTP shuffle
> -----------------------------------------------------
>
>                 Key: TEZ-2442
>                 URL: https://issues.apache.org/jira/browse/TEZ-2442
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.5.3
>            Reporter: Kannan Rajah
>            Assignee: shanyu zhao
>         Attachments: FS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf, hdfs_broadcast_hack.txt, tez-2442-trunk.2.patch, tez-2442-trunk.3.patch, tez-2442-trunk.4.patch, tez-2442-trunk.5.patch, tez-2442-trunk.patch, tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared between stages. Shuffle data is written to local disk and fetched from any remote node using HTTP. A DFS like MapR file system can support writing this shuffle data directly to its DFS using a notion of local volumes and retrieve it using HDFS API from remote node. The current Shuffle implementation assumes local data can only be managed by LocalFileSystem. So it uses RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption and introduce an abstraction to manage local disks, then we can reuse most of the shuffle logic (store, sort) and inject a HDFS API based retrieval instead of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)