You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Tao Yang (JIRA)" <ji...@apache.org> on 2019/02/20 09:36:00 UTC
[jira] [Comment Edited] (YARN-9313) Support asynchronized scheduling mode and multi-node lookup mechanism for scheduler activities

    [ https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772781#comment-16772781 ] 

Tao Yang edited comment on YARN-9313 at 2/20/19 9:35 AM:
---------------------------------------------------------

Descriptions of key changes in this patch are as follows, hope someone can help for the review:
 1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to represent multiple nodes.
 2. Place the start/finish points of scheduler activities in front of/after the allocation based on single node (input node is a real node) or multiple nodes (input node is ActivitiesManager#MULTI_NODES_AGENT) in CapacityScheduler#allocateContainersToNode instead of CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified entrance and exit.
 3. After initializing activities, activeRecordedNodes should remove current active node in ActivitiesManager#startNodeUpdateRecording to make sure current activities process can only be started once.
 4. Maintain the relationships between input node and recording key. For multi-nodes placement scenario, input node can be a special node or null, the nodeId in recordingNodeAllocation should be ActivitiesManager#MULTI_NODES_AGENT and the nodeId in activities info should be a special node or ActivitiesManager#MULTI_NODES_AGENT. Thus we need to get correct nodeId in recording key or nodeId in activities info based on input node: (1) nodeId should be the nodeId of input node which is not null, and should be ActivitiesManager#MULTI_NODES_AGENT when input node is null meanwhile multi-nodes is enabled, somewhere should be updated properly in ActivitiesLogger. (2) When recording activities, nodeId in activities info could be a special node but in recordingNodeAllocation nodeId should be ActivitiesManager#MULTI_NODES_AGENT, so that we need to get correct recording key at the head of ActivitiesManager#getCurrentNodeAllocation and still recording the nodeId of input node in activities info.
 5. Update the if clauses at the head of several methods in ActivitiesLogger to relax restrictions(only for non-null node now) on scheduler activities.
 6. ActivitiesManager#recordingNodesAllocation should be updated to be a thread-local variable to avoid recording mixed activities from multiple scheduling processes in asynchronized scheduling mode.
 7. Add TestActivitiesManager to test multiple threads can run without interference for normal scenario and multi-nodes enabled scenario.
 8. Update check logic in TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat since collection logic of scheduler activities changed after this patch and only one allocation should be recorded for all scenarios.
 9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test recording scheduler activities with multi-nodes enabled.


was (Author: tao yang):
Descriptions of key changes in this patch are as follows, hope someone can help for the review:
1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to represent multiple nodes.
2. Place the start/finish points of scheduler activities in front of/after the allocation based on single node (input node is a real node) or multiple nodes (input node is ActivitiesManager#MULTI_NODES_AGENT) in CapacityScheduler#allocateContainersToNode instead of CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified entrance and exit.
3. After initializing activities, activeRecordedNodes should remove current active node in ActivitiesManager#startNodeUpdateRecording to make sure current activities process can only be started once.
4. Maintain the relationships between input node and activities key. For multi-nodes placement scenario, input node can be a special node or null, the activities index should be ActivitiesManager#MULTI_NODES_AGENT and activities info should be a special node or ActivitiesManager#MULTI_NODES_AGENT. Thus we need to transform nodeId somewhere to make it work: (1) Input nodeId should be a special nodeId if input node is not null and should be ActivitiesManager#MULTI_NODES_AGENT if input node is null and multi-nodes is recording, input nodeId should be updated properly in ActivitiesLogger. (2) When recording activities, input node could be a special node but activities key should be ActivitiesManager#MULTI_NODES_AGENT, so that we need to get correct recording key at the head of ActivitiesManager#getCurrentNodeAllocation and still recording the special nodeId in activities info.
5. Update the if clauses at the head of several methods in ActivitiesLogger to relax restrictions(only for non-null node now) on scheduler activities.
6. ActivitiesManager#recordingNodesAllocation should be updated to be a thread-local variable to avoid recording mixed activities from multiple scheduling processes in asynchronized scheduling mode.
7. Add TestActivitiesManager to test multiple threads can run without interference for normal scenario and multi-nodes enabled scenario.
8. Update check logic in TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat since collection logic of scheduler activities changed after this patch and only one allocation should be recorded for all scenarios.
9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test recording scheduler activities with multi-nodes enabled.

> Support asynchronized scheduling mode and multi-node lookup mechanism for scheduler activities
> ----------------------------------------------------------------------------------------------
>
>                 Key: YARN-9313
>                 URL: https://issues.apache.org/jira/browse/YARN-9313
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: YARN-9313.001.patch
>
>
> [Design doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.d2ru7sigsi7j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org