You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Turner Eagles (Jira)" <ji...@apache.org> on 2020/05/15 18:45:00 UTC

[jira] [Commented] (TEZ-4183) Time- and threshold-batched FetchFailure event propagation to AM

    [ https://issues.apache.org/jira/browse/TEZ-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108544#comment-17108544 ] 

Jonathan Turner Eagles commented on TEZ-4183:
---------------------------------------------

Thanks for reporting this [~pgaref].

There are actually two fetchers in tez versus mapreduce: unordered and ordered. The reporter is referring to the unordered fetch failure use case. Ordered fetching does have thresholds and much of that logic should be ported to unordered case.

Ordered logic for reference.
https://github.com/apache/tez/blob/354c2a4177fe8c3cf6b8a4c6009d4068a19d81f1/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/orderedgrouped/ShuffleScheduler.java#L785

This logic is only based on count and doesn't have time based fetch failure.

> Time- and threshold-batched FetchFailure event propagation to AM
> ----------------------------------------------------------------
>
>                 Key: TEZ-4183
>                 URL: https://issues.apache.org/jira/browse/TEZ-4183
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Panagiotis Garefalakis
>            Priority: Major
>
> Fetcher currently sends failure events to AM as soon as they are discovered:
> https://github.com/apache/tez/blob/354c2a4177fe8c3cf6b8a4c6009d4068a19d81f1/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/impl/ShuffleManager.java#L930
> To reduce AM pressure we can: 1) Batch fetch failure events to be sent periodically (every BATCH_WAIT) and 2) if we see disk errors more than a Threshold send the message immediately to AM (instead of waiting)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)