You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2009/01/06 21:25:44 UTC

[jira] Commented: (HADOOP-4971) Block report times from datanodes could converge to same time.

    [ https://issues.apache.org/jira/browse/HADOOP-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661299#action_12661299 ] 

Raghu Angadi commented on HADOOP-4971:
--------------------------------------

The delay need not be (in fact, it was not) in block report RPC. This was else where in the offerService loop. Something at NameNode caused this.

Proposed fix  (comment from the patch) : {noformat}
            /* say, last block report was at 8:20:14. current report should have
             * occured around 9:20:14 (with default 1 hour block reports). 
             * If current time is something :
             * 1) normal like 9:20:18, next report should be at 10:20:14
             * 2) unexpected like 11:35:43, next report should be at 12:20:14
             */
{noformat}


> Block report times from datanodes could converge to same time.   
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4971
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4971
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>
> Datanode block reports take quite a bit of memory to process at the namenode. After the inital report, DNs pick a random time to spread this load across at the NN. This normally works fine. 
> Block reports are sent inside "offerService()" thread in DN. If for some reason this thread was stuck for long time (comparable to block report interval), and same thing happens on many DNs, all of them get back to the loop at the same time and start sending block report then and every hour at the same time. 
> RPC server and clients in 0.18 can handle this situation fine. But since this is a memory intensive RPC it lead to large GC delays at the NN. We don't know yet why offerService therads seemed to be stuck, but DN should re-randomize it block report time in such cases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.