You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2018/11/28 15:24:00 UTC

[jira] [Commented] (YARN-7086) Release all containers aynchronously

    [ https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702010#comment-16702010 ] 

Jason Lowe commented on YARN-7086:
----------------------------------

Sorry for the long delay.  It's good to see the performance number variance mostly eliminated.

I'm still not convinced this is something we want to do.  The performance numbers show that async release is almost 3x more expensive in terms of release latency than what we have today.  I think we need a clear use case showing that the increased latency is buying us something worth that increased cost, both in terms of latency and code complexity.  "Given we want to release containers async" was based on the old code where there was a very expensive lock being acquired for each container release, but that does not appear to be the case in recent builds.  Now that the expensive lock is out of this critical path, I'm not sure we want or need this added complexity.

Are others seeing issues with bulk container releases in recent builds?  Is there still a general demand for this feature?


> Release all containers aynchronously
> ------------------------------------
>
>                 Key: YARN-7086
>                 URL: https://issues.apache.org/jira/browse/YARN-7086
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Arun Suresh
>            Assignee: Manikandan R
>            Priority: Major
>         Attachments: YARN-7086.001.patch, YARN-7086.002.patch, YARN-7086.Perf-test-case.patch
>
>
> We have noticed in production two situations that can cause deadlocks and cause scheduling of new containers to come to a halt, especially with regard to applications that have a lot of live containers:
> # When these applicaitons release these containers in bulk.
> # When these applications terminate abruptly due to some failure, the scheduler releases all its live containers in a loop.
> To handle the issues mentioned above, we have a patch in production to make sure ALL container releases happen asynchronously - and it has served us well.
> Opening this JIRA to gather feedback on if this is a good idea generally (cc [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
> BTW, In YARN-6251, we already have an asyncReleaseContainer() in the AbstractYarnScheduler and a corresponding scheduler event, which is currently used specifically for the container-update code paths (where the scheduler realeases temp containers which it creates for the update)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org