You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2014/03/10 19:09:45 UTC
[jira] [Created] (TEZ-924) InputFailedEvent handling for Shuffle
Siddharth Seth created TEZ-924:
----------------------------------
Summary: InputFailedEvent handling for Shuffle
Key: TEZ-924
URL: https://issues.apache.org/jira/browse/TEZ-924
Project: Apache Tez
Issue Type: Bug
Reporter: Siddharth Seth
Shuffle receives batches of Events to process from the AM. The way these events are sent over to the ShuffleHandlers and the way they're processed - it's possible that Shuffle will start fetching data from an Event, which is to be subsequently marked as failed (via an InputFailedEvent)
1) The AM sends events in batches. An InputFailedEvent for a specific Input may not be part of the same batch which contained the original event which is being marked bad.
2) The ShuffleEventHandler processes the events in each batch one event at a time - so even if the InputFailedEvent follows - it's possible for Shuffle to start fetching data from a Failed Input.
The AM needs to change to invalidate Inputs up front - so that related events don't span batches. Alternately, it needs to apply the InputFailedEvent to the original event being sent.
The Shuffle itself should process a batch update as a batch - that would prevent fetchers from starting early even though there may be additional events for the same host.
--
This message was sent by Atlassian JIRA
(v6.2#6252)