You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Prabhu Joseph (Jira)" <ji...@apache.org> on 2022/08/11 12:02:00 UTC
[jira] [Created] (YARN-11251) Separate ThreadPool for AMLauncher Launch and Clean Events
Prabhu Joseph created YARN-11251:
------------------------------------
Summary: Separate ThreadPool for AMLauncher Launch and Clean Events
Key: YARN-11251
URL: https://issues.apache.org/jira/browse/YARN-11251
Project: Hadoop YARN
Issue Type: Improvement
Components: yarn
Affects Versions: 3.4.0
Reporter: Prabhu Joseph
Assignee: Samrat Deb
Have seen too many AM Launch Failures due to Token Expired or Container Liveliness Expiry when AM Launch Threads are busy retrying to connect to AM Host (Spot Instances) which are down. Having Separate ThreadPools for both Cleanup and Launch will reduce the AM Launch failures.
*Token Expired*
{code}
2022-07-19 14:56:33,486 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl (IPC Server handler 39 on 8041): Unauthorized request to start container.
This token is expired. current time is 1658242593486 found 1658242289457
Note: System times on machines may be out of sync. Check system time and time zones.
{code}
*Container Liveliness Expiry*
{code}
2022-07-19 16:06:48,663 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl (ResourceManager Event Processor): container_1656573205571_2357731_01_000001 Container Transitioned from ACQUIRED to EXPIRED
2022-07-19 16:10:08,663 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor (Ping Checker): Expired:<container=container_1656573205571_2357773_01_000001, increase=false> Timed out after 600 secs
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org