You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2016/06/08 20:20:21 UTC

[jira] [Created] (TEZ-3293) Fetch failures can cause a shuffle hang waiting for memory merge that never starts

Jason Lowe created TEZ-3293:
-------------------------------

             Summary: Fetch failures can cause a shuffle hang waiting for memory merge that never starts
                 Key: TEZ-3293
                 URL: https://issues.apache.org/jira/browse/TEZ-3293
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.8.3, 0.7.1
            Reporter: Jason Lowe
            Assignee: Jason Lowe


Tez jobs can hang in shuffle waiting for a memory merge that never starts.  When a MapOutput is reserved it increments usedMemory but when it is unreserved it decrements usedMemory _and_ commitMemory.  If enough shuffle failures occur of sufficient size then commitMemory may never reach the merge threshold even after all outstanding transfers have committed and thus hang the shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)