You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2013/10/29 15:02:30 UTC

[jira] [Commented] (TEZ-588) TaskScheduler may end up releasing containers and getting stuck without enough resources

    [ https://issues.apache.org/jira/browse/TEZ-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807998#comment-13807998 ] 

Hitesh Shah commented on TEZ-588:
---------------------------------

[~bikassaha] was this path hit for a new container? Earlier, this case was only encountered for a re-used container hence no need to update asks.

> TaskScheduler may end up releasing containers and getting stuck without enough resources
> ----------------------------------------------------------------------------------------
>
>                 Key: TEZ-588
>                 URL: https://issues.apache.org/jira/browse/TEZ-588
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>
> {code}
> if (hitFinalMatchLevel) {
>             // Are there any pending requests at any priority?
>             // release if there are tasks or this is not a session
>             if (!taskRequests.isEmpty() || !appContext.isSession()) {
>               LOG.info("Releasing held container as either there are pending but "
>                 + " unmatched requests or this is not a session"
>                 + ", containerId=" + heldContainer.container.getId()
>                 + ", pendingTasks=" + !taskRequests.isEmpty()
>                 + ", isSession=" + appContext.isSession()
>                 + ". isNew=" + isNew);
>               releaseUnassignedContainers(
>                 Lists.newArrayList(heldContainer.container));
>             }
> {code}
> The above code releases these containers and expects to get better matching containers from the RM. But when the RM first allocated these containers to the job then it had already reduced the ask for this job. If the ask on the RM is currently 0 then these released containers will not be replaced by the RM with new containers until the the AM makes new container requests for the tasks that it has not yet been able to match to these containers. We dont seem to be making those new container requests and thus risk getting stuck on resources.



--
This message was sent by Atlassian JIRA
(v6.1#6144)