You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2013/10/28 18:36:31 UTC

[jira] [Created] (TEZ-588) TaskScheduler may end up releasing containers and getting stuck without enough resources

Bikas Saha created TEZ-588:
------------------------------

             Summary: TaskScheduler may end up releasing containers and getting stuck without enough resources
                 Key: TEZ-588
                 URL: https://issues.apache.org/jira/browse/TEZ-588
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Bikas Saha


{code}
if (hitFinalMatchLevel) {
            // Are there any pending requests at any priority?
            // release if there are tasks or this is not a session
            if (!taskRequests.isEmpty() || !appContext.isSession()) {
              LOG.info("Releasing held container as either there are pending but "
                + " unmatched requests or this is not a session"
                + ", containerId=" + heldContainer.container.getId()
                + ", pendingTasks=" + !taskRequests.isEmpty()
                + ", isSession=" + appContext.isSession()
                + ". isNew=" + isNew);
              releaseUnassignedContainers(
                Lists.newArrayList(heldContainer.container));
            }
{code}
The above code releases these containers and expects to get better matching containers from the RM. But when the RM first allocated these containers to the job then it had already reduced the ask for this job. If the ask on the RM is currently 0 then these released containers will not be replaced by the RM with new containers until the the AM makes new container requests for the tasks that it has not yet been able to match to these containers. We dont seem to be making those new container requests and thus risk getting stuck on resources.



--
This message was sent by Atlassian JIRA
(v6.1#6144)