You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Suresh Srinivas (JIRA)" <ji...@apache.org> on 2009/03/12 23:42:50 UTC

[jira] Updated: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

     [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Srinivas updated HADOOP-4584:
------------------------------------

    Attachment: 4584.brthread.2.patch

Couple of changes that requires careful review:
# Currently metadata file exists without a corresponding block is deleted. Could this cause a problem (that is if metadata file was created before a block file is created)?
# DirectoryScanner is added to DataBlockScanner. While reconciling the difference between the disk and in-memory blocks, existing scanner functionality is suspended. Would this cause problem for DataBlockScanner in ensuring its constraints such as scanning with in certain period of time and using configured bandwidth?

> Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.brthread.2.patch, 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead.
> It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.