You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2016/06/08 20:20:21 UTC
[jira] [Created] (TEZ-3293) Fetch failures can cause a shuffle hang
waiting for memory merge that never starts
Jason Lowe created TEZ-3293:
-------------------------------
Summary: Fetch failures can cause a shuffle hang waiting for memory merge that never starts
Key: TEZ-3293
URL: https://issues.apache.org/jira/browse/TEZ-3293
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.8.3, 0.7.1
Reporter: Jason Lowe
Assignee: Jason Lowe
Tez jobs can hang in shuffle waiting for a memory merge that never starts. When a MapOutput is reserved it increments usedMemory but when it is unreserved it decrements usedMemory _and_ commitMemory. If enough shuffle failures occur of sufficient size then commitMemory may never reach the merge threshold even after all outstanding transfers have committed and thus hang the shuffle.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)