You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Patrick Kling <pk...@cs.uwaterloo.ca> on 2010/11/17 01:39:18 UTC
Review Request: Populate needed replication queues before leaving safe mode.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
-----------------------------------------------------------
Review request for hadoop-hdfs.
Summary
-------
This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally.
The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode.
This addresses bug HDFS-1476.
https://issues.apache.org/jira/browse/HDFS-1476
Diffs
-----
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1035545
Diff: https://reviews.apache.org/r/105/diff
Testing
-------
new test case in TestListCorruptFileBlocks
Thanks,
Patrick
Re: Review Request: Populate needed replication queues before leaving safe
mode.
Posted by Dhruba Borthakur <dh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/#review39
-----------------------------------------------------------
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
<https://reviews.apache.org/r/105/#comment27>
Please change the default to 1, so that it is backward compatible.
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
<https://reviews.apache.org/r/105/#comment29>
We can first check canInitializeReplQueue to optimize on CPU.
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
<https://reviews.apache.org/r/105/#comment28>
This can move to after the SafeMode daemon is created.
- Dhruba
On 2010-11-16 16:39:18, Patrick Kling wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/105/
> -----------------------------------------------------------
>
> (Updated 2010-11-16 16:39:18)
>
>
> Review request for hadoop-hdfs.
>
>
> Summary
> -------
>
> This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally.
>
> The benefit of this is twofold:
> - It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map).
> - With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode.
>
>
> This addresses bug HDFS-1476.
> https://issues.apache.org/jira/browse/HDFS-1476
>
>
> Diffs
> -----
>
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1035545
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1035545
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1035545
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1035545
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1035545
>
> Diff: https://reviews.apache.org/r/105/diff
>
>
> Testing
> -------
>
> new test case in TestListCorruptFileBlocks
>
>
> Thanks,
>
> Patrick
>
>
Re: Review Request: Populate needed replication queues before leaving safe
mode.
Posted by Patrick Kling <pk...@cs.uwaterloo.ca>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
-----------------------------------------------------------
(Updated 2010-12-09 19:52:13.188412)
Review request for hadoop-hdfs.
Changes
-------
- Updated patch to apply to current trunk.
- In BlockManager.markBlockAsCorrupt only update needed replication queues if they have been initialized
Summary
-------
This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally.
The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode.
This addresses bug HDFS-1476.
https://issues.apache.org/jira/browse/HDFS-1476
Diffs (updated)
-----
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1044182
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1044182
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1044182
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1044182
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1044182
Diff: https://reviews.apache.org/r/105/diff
Testing
-------
new test case in TestListCorruptFileBlocks
Thanks,
Patrick
Re: Review Request: Populate needed replication queues before leaving safe
mode.
Posted by Patrick Kling <pk...@cs.uwaterloo.ca>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
-----------------------------------------------------------
(Updated 2010-11-19 13:07:20.231197)
Review request for hadoop-hdfs.
Changes
-------
Updated test case to play nice with HDFS-1482.
Summary
-------
This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally.
The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode.
This addresses bug HDFS-1476.
https://issues.apache.org/jira/browse/HDFS-1476
Diffs (updated)
-----
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1035545
Diff: https://reviews.apache.org/r/105/diff
Testing
-------
new test case in TestListCorruptFileBlocks
Thanks,
Patrick
Re: Review Request: Populate needed replication queues before leaving safe
mode.
Posted by Patrick Kling <pk...@cs.uwaterloo.ca>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
-----------------------------------------------------------
(Updated 2010-11-18 10:49:38.102334)
Review request for hadoop-hdfs.
Changes
-------
Changed default value of replication queue threshold to safe mode threshold.
Summary
-------
This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally.
The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode.
This addresses bug HDFS-1476.
https://issues.apache.org/jira/browse/HDFS-1476
Diffs (updated)
-----
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1035545
Diff: https://reviews.apache.org/r/105/diff
Testing
-------
new test case in TestListCorruptFileBlocks
Thanks,
Patrick
Re: Review Request: Populate needed replication queues before leaving safe
mode.
Posted by Patrick Kling <pk...@cs.uwaterloo.ca>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
-----------------------------------------------------------
(Updated 2010-11-16 18:01:44.268029)
Review request for hadoop-hdfs.
Changes
-------
Incorporated Dhruba's feedback. Thank you!
Summary
-------
This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally.
The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode.
This addresses bug HDFS-1476.
https://issues.apache.org/jira/browse/HDFS-1476
Diffs (updated)
-----
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1035545
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1035545
Diff: https://reviews.apache.org/r/105/diff
Testing
-------
new test case in TestListCorruptFileBlocks
Thanks,
Patrick