You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2012/12/28 23:48:12 UTC

[jira] [Commented] (YARN-275) Make NodeManagers to NOT blindly heartbeat irrespective of whether previous heartbeat is processed or not.

    [ https://issues.apache.org/jira/browse/YARN-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540621#comment-13540621 ] 

Bikas Saha commented on YARN-275:
---------------------------------

I briefly looked at the patch. The general approach seems promising. I have some comments on how we can structure this changes
We could break this work into 2 parts
1) protocol changes in heartbeat to transfer heartbeat control frequency from NM to RM. After this, in every heartbeat the RM will tell the NM when to send the next heartbeat. That value can be hardcoded (like it is currently) but preferably we can have an RM config that defines what the minimum heartbeat interval should be and use that. For this part, I dont think we need both backoff and heartbeatinterval in the heartbeat response. We can just have only heartbeatinterval that is always respected by the NM.
2) add some logic/heuristic to the RM so that it can dynamically change the heartbeat interval based on its current processing load/rate. This way the interval can be made longer when the RM is not keeping up with heartbeats.
If you think this break-up of works makes sense then we can create 2 sub-tasks under this jira for the 2 parts.

I have some additional ideas on part 1 also.
When a heartbeat comes at time T to the RM then it can choose to 
A) accept the request at time T and ask NM to heartbeat after time T+K with new information. This adds more load to the current RM load. This is what the current code does. So no change is required to do this.
B) reject the request at time T and ask NM to heartbeat after time T+K with current+new information. This does not increase load on RM but makes NM more complex because it needs to hold onto the last heartbeat data and merge in new data to it.
What do you think about these alternatives?
                
> Make NodeManagers to NOT blindly heartbeat irrespective of whether previous heartbeat is processed or not.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-275
>                 URL: https://issues.apache.org/jira/browse/YARN-275
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Xuan Gong
>         Attachments: YARN-270.1.patch
>
>
> We need NMs to back off. The event handler mechanism is very scalable but not infinitely so :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira