You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2007/01/25 20:34:49 UTC

[jira] Commented: (HADOOP-923) DFS Scalability: datanode heartbeat timeouts cause cascading timeouts of other datanodes

    [ https://issues.apache.org/jira/browse/HADOOP-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467507 ] 

dhruba borthakur commented on HADOOP-923:
-----------------------------------------

I gathered simulated measurements on the time to process a single heartbeat on the namenode versus the number of blocks to replicate. Here is the data:

pending Blocks to replicate           time to process one heartbeat (millisec)
100,000                                         2
500,000                                         5
600,000                                         6

100,000 blocks typically corresponds to about 12 TB. This is analogous to the capacity of three typical datanodes., Thus, if three datanodes go down at the same time, the namenode spends about 2 ms of CPU time to process a single incoming heartbeat. The global FSNamesystem lock is kept for this entire 2 ms.

In the current implementation, heartbeats are sent by the datanode once every 3 seconds. A 1500 node cluster will cause the namenode to spend about 3 seconds (2ms * 1500) processing just heartbeat requests. The current DFS's scalabilty could be limited to 1500 datanodes.

The above results could vary depending on the type of hardware and communcation link that is being used.

 



> DFS Scalability: datanode heartbeat timeouts cause cascading timeouts of other datanodes
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-923
>                 URL: https://issues.apache.org/jira/browse/HADOOP-923
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.10.1
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>
> The datanode sends a heartbeat to the namenode every 3 seconds. The namenode processes the heartbeat and sends  a list of block-to-be-replicated and blocks-to-be-deleted as part of the heartbeat response.
> At times when a couple of datanodes fail, the heartbeat processing on the namenode becomes pretty heavyweight. It acquires the global FSNamesystem lock, traverses the neededReplication structure, generates a list of blocks to be replicated and responds to the heartbeat message. Determining the list of blocks-to-be-replciated is pretty heavyweight, takes plenty of CPU and blocks processing of other heartbeats because of the global FSNamesystem lock.
> It would improve scalability a lot if heartbeat processing does not require the FSNamesystem lock. In fact, the pre-existing "heartbeat" lock already exists for this purpose. 
> I propose that the Heartbeat message be separate from the "retrieve blocks-to-replicate and blocks-to-delete" messages. The datanode can continue to heartbeat once every 3 seconds while it can afford to "retrieve blocks-to-replicate" at a much coarser interval. Heartbeat processing on the namenode will be fast because it does not require the global FSNamesystem lock. Moreover, a datanode failure will not aggrevate the heartbeat processing time on the namenode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.