You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2013/10/28 18:36:31 UTC
[jira] [Created] (TEZ-588) TaskScheduler may end up releasing
containers and getting stuck without enough resources
Bikas Saha created TEZ-588:
------------------------------
Summary: TaskScheduler may end up releasing containers and getting stuck without enough resources
Key: TEZ-588
URL: https://issues.apache.org/jira/browse/TEZ-588
Project: Apache Tez
Issue Type: Bug
Reporter: Bikas Saha
{code}
if (hitFinalMatchLevel) {
// Are there any pending requests at any priority?
// release if there are tasks or this is not a session
if (!taskRequests.isEmpty() || !appContext.isSession()) {
LOG.info("Releasing held container as either there are pending but "
+ " unmatched requests or this is not a session"
+ ", containerId=" + heldContainer.container.getId()
+ ", pendingTasks=" + !taskRequests.isEmpty()
+ ", isSession=" + appContext.isSession()
+ ". isNew=" + isNew);
releaseUnassignedContainers(
Lists.newArrayList(heldContainer.container));
}
{code}
The above code releases these containers and expects to get better matching containers from the RM. But when the RM first allocated these containers to the job then it had already reduced the ask for this job. If the ask on the RM is currently 0 then these released containers will not be replaced by the RM with new containers until the the AM makes new container requests for the tasks that it has not yet been able to match to these containers. We dont seem to be making those new container requests and thus risk getting stuck on resources.
--
This message was sent by Atlassian JIRA
(v6.1#6144)