You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2018/05/22 13:33:00 UTC

[jira] [Commented] (TEZ-3917) Speculative task attempt's DMEs can cause downstream fetcher to NPE or duplicate fetch

    [ https://issues.apache.org/jira/browse/TEZ-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483945#comment-16483945 ] 

Jonathan Eagles commented on TEZ-3917:
--------------------------------------

[~kshukla], have you confirmed that this is/isn't a data corruption issue? This would be a blocker for any release with dropped/duplicated data

> Speculative task attempt's DMEs can cause downstream fetcher to NPE or duplicate fetch
> --------------------------------------------------------------------------------------
>
>                 Key: TEZ-3917
>                 URL: https://issues.apache.org/jira/browse/TEZ-3917
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.1
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>            Priority: Major
>         Attachments: TEZ-3917.test-empty-partitions.patch
>
>
> STA0 , STA1
>          |
>          |
> DTA0 , DTA1
>  
> Take the above example of  DTA0 initially fetching from upstream source task which has 2 attempts, one speculative (say STA1).
> There exists a race where in DME from STA1 comes in to DTA0 and is fetched followed by the fetch from STA0 (the successful one) being marked as duplicate. The DME from STA1 is sent before it is marked as killed by the AM.
> This additional event can also lead to an NPE since fetcher thread is assigned this additional output to be fetched while ShuffleScheduler thinks it has fetched all the mapoutputs since it is not prepared to handle the extra events coming in from the the speculative attempts.
> There are cases where DTA0 NPEs and DTA1 shows duplicate fetches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)