You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2014/03/10 19:09:45 UTC

[jira] [Created] (TEZ-924) InputFailedEvent handling for Shuffle

Siddharth Seth created TEZ-924:
----------------------------------

             Summary: InputFailedEvent handling for Shuffle
                 Key: TEZ-924
                 URL: https://issues.apache.org/jira/browse/TEZ-924
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Siddharth Seth


Shuffle receives batches of Events to process from the AM. The way these events are sent over to the ShuffleHandlers and the way they're processed - it's possible that Shuffle will start fetching data from an Event, which is to be subsequently marked as failed (via an InputFailedEvent)

1) The AM sends events in batches. An InputFailedEvent for a specific Input may not be part of the same batch which contained the original event which is being marked bad.

2) The ShuffleEventHandler processes the events in each batch one event at a time - so even if the InputFailedEvent follows - it's possible for Shuffle to start fetching data from a Failed Input.

The AM needs to change to invalidate Inputs up front - so that related events don't span batches. Alternately, it needs to apply the InputFailedEvent to the original event being sent.
The Shuffle itself should process a batch update as a batch - that would prevent fetchers from starting early even though there may be additional events for the same host.



--
This message was sent by Atlassian JIRA
(v6.2#6252)