You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "TezQA (JIRA)" <ji...@apache.org> on 2016/04/30 03:39:12 UTC

[jira] [Commented] (TEZ-2442) Support DFS based shuffle in addition to HTTP shuffle

    [ https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265058#comment-15265058 ] 

TezQA commented on TEZ-2442:
----------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment
  http://issues.apache.org/jira/secure/attachment/12800295/tez-2442-trunk.2.patch
  against master revision 727584f.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 5 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings.

        {color:red}-1 release audit{color}.  The applied patch generated 1 release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1693//testReport/
Release audit warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1693//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1693//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1693//console

This message is automatically generated.

> Support DFS based shuffle in addition to HTTP shuffle
> -----------------------------------------------------
>
>                 Key: TEZ-2442
>                 URL: https://issues.apache.org/jira/browse/TEZ-2442
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.5.3
>            Reporter: Kannan Rajah
>            Assignee: shanyu zhao
>         Attachments: FS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf, hdfs_broadcast_hack.txt, tez-2442-trunk.2.patch, tez-2442-trunk.patch, tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared between stages. Shuffle data is written to local disk and fetched from any remote node using HTTP. A DFS like MapR file system can support writing this shuffle data directly to its DFS using a notion of local volumes and retrieve it using HDFS API from remote node. The current Shuffle implementation assumes local data can only be managed by LocalFileSystem. So it uses RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption and introduce an abstraction to manage local disks, then we can reuse most of the shuffle logic (store, sort) and inject a HDFS API based retrieval instead of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)