You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Patrick Kling <pk...@cs.uwaterloo.ca> on 2010/12/10 04:52:13 UTC

Re: Review Request: Populate needed replication queues before leaving safe mode.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
-----------------------------------------------------------

(Updated 2010-12-09 19:52:13.188412)


Review request for hadoop-hdfs.


Changes
-------

- Updated patch to apply to current trunk.
- In BlockManager.markBlockAsCorrupt only update needed replication queues if they have been initialized


Summary
-------

This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally.

The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode.


This addresses bug HDFS-1476.
    https://issues.apache.org/jira/browse/HDFS-1476


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1044182 
  http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1044182 
  http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1044182 
  http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1044182 
  http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1044182 

Diff: https://reviews.apache.org/r/105/diff


Testing
-------

new test case in TestListCorruptFileBlocks


Thanks,

Patrick