You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2016/10/25 15:42:58 UTC
[jira] [Commented] (TEZ-3491) Tez job can hang due to container priority inversion

    [ https://issues.apache.org/jira/browse/TEZ-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605652#comment-15605652 ] 

Jason Lowe commented on TEZ-3491:
---------------------------------

When the containers expire the Tez AM emits logs like this:
{noformat}
2016-10-23 01:36:49,207 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Ignoring unknown container: container_e08_1475789370361_492567_01_000166
2016-10-23 01:36:49,292 [INFO] [DelayedContainerManager] |rm.YarnTaskSchedulerService|: Skipping delayed container as container is no longer running, containerId=container_e08_1475789370361_492567_01_000166
{noformat}

I can see a couple of approaches to fix this:
1) Release the lower priority container to make sure we free up enough space to allocate the necessary high-priority containers to satisfy the top priority requests.  These released containers need to be re-requested if there are still pending requests at the container's priority.

2) Allow the lower priority container to be used by a lower priority task.  We risk a similar priority inversion problem here if the lower priority task ends up waiting for the higher priority task to complete and needs to free up its resources for that to happen (e.g.: reducer waiting for upstream task but queue is full).  However the existing preemption logic should cover this scenario since it can happen anyway (via fetch-failed task re-runs).

I'm slightly leaning towards option 2) since there are many cases where the lower priority task can complete on its own (i.e.: has no dependencies on the pending higher-priority tasks), and we have an allocation in hand to start working on that task.

Note another related problem that should be addressed is when we lose containers due to expiration.  Currently if any container allocation expires the Tez AM is going to drop it without re-requesting it.  This is going to either lead to reduced performance if container reuse allows the AM to funnel the tasks through fewer containers or an outright hang if it cannot reuse other containers.


> Tez job can hang due to container priority inversion
> ----------------------------------------------------
>
>                 Key: TEZ-3491
>                 URL: https://issues.apache.org/jira/browse/TEZ-3491
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jason Lowe
>            Priority: Critical
>
> If the Tez AM receives containers at a lower priority than the highest priority task being requested then it fails to assign the container to any task.  In addition if the container is new then it refuses to release it if there are any pending tasks.  If it takes too long for the higher priority requests to be fulfilled (e.g.: the lower priority containers are filling the queue) then eventually YARN will expire the unused lower priority containers since they were never launched.  The Tez AM then never re-requests these lower priority containers and the job hangs because the AM is waiting for containers from the RM that the RM already sent and expired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)