You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Kuhu Shukla (JIRA)" <ji...@apache.org> on 2018/09/12 20:23:00 UTC

[jira] [Commented] (TEZ-3990) The number of shuffle penalties for a host/inputAttemptIdentifier should be capped

    [ https://issues.apache.org/jira/browse/TEZ-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612684#comment-16612684 ] 

Kuhu Shukla commented on TEZ-3990:
----------------------------------

Simple change that limits addition in penalties data structure based on number of failures (which I am currently getting the value from existing config used to report failures to AM).

> The number of shuffle penalties for a host/inputAttemptIdentifier should be capped
> ----------------------------------------------------------------------------------
>
>                 Key: TEZ-3990
>                 URL: https://issues.apache.org/jira/browse/TEZ-3990
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.1, 0.10.0
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>            Priority: Major
>         Attachments: TEZ-3990.001.patch
>
>
> In a scenario where the same mapId fetches fail, the penalty code allows adding the same Host/InputAttemptIdentifier over and over with revised penalty time that grows exponentially. It should at some point drop the retrying and report failure to the AM asap to allow the job to rectify the upstream output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)