You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by "Panagiotis Garefalakis (Jira)" <ji...@apache.org> on 2020/05/15 17:38:00 UTC

[jira] [Created] (TEZ-4183) Time- and threshold-batched FetchFailure event propagation to AM

Panagiotis Garefalakis created TEZ-4183:
-------------------------------------------

             Summary: Time- and threshold-batched FetchFailure event propagation to AM
                 Key: TEZ-4183
                 URL: https://issues.apache.org/jira/browse/TEZ-4183
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Panagiotis Garefalakis


Fetcher currently sends failure events to AM as soon as they are discovered:
https://github.com/apache/tez/blob/354c2a4177fe8c3cf6b8a4c6009d4068a19d81f1/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/impl/ShuffleManager.java#L930

To reduce AM pressure we can: 1) Batch fetch failure events to be sent periodically (every BATCH_WAIT) and 2) if we see disk errors more than a Threshold send the message immediately to AM (instead of waiting)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)