You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2014/03/01 20:04:19 UTC

[jira] [Updated] (TEZ-902) Fetch failure issues in shuffle Input

     [ https://issues.apache.org/jira/browse/TEZ-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bikas Saha updated TEZ-902:
---------------------------

    Attachment: TEZ-902.1.patch

Two issues exist
1) A failed input is not obsoleted and thus the fetcher can get hung retrying the same input
2) If there are multiple versions of an input (say for some reason the first version was killed and then regenerated) then the Fetcher tries to download all versions instead of the last version.

The patch fixes these issues by deduping the inputs and obsoleting failed inputs. It also adds a check for mapid's returned by ShuffleService response header. This catches errors more clearly. Before the patch, the ShuffleHeader would read in garbage values and complain about unexpected an invalid id (which was actually null).

There is no isolated test for this code. Opened TEZ-907 for that. It needs more effort. Tested this patch with successful jobs to verify straight line behavior as well as on read error cases with help from [~tassapola]. 

[~sseth] please review

> Fetch failure issues in shuffle Input
> -------------------------------------
>
>                 Key: TEZ-902
>                 URL: https://issues.apache.org/jira/browse/TEZ-902
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-902.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)