You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2021/02/04 09:47:08 UTC

[GitHub] [hadoop] tasanuma commented on a change in pull request #2588: HDFS-15761. Dead NORMAL DN shouldn't transit to DECOMMISSIONED immediately

tasanuma commented on a change in pull request #2588:
URL: https://github.com/apache/hadoop/pull/2588#discussion_r570072022



##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
##########
@@ -383,30 +383,70 @@ public void testDecommissionStatusAfterDNRestart() throws Exception {
 
   /**
    * Verify the support for decommissioning a datanode that is already dead.
-   * Under this scenario the datanode should immediately be marked as
-   * DECOMMISSIONED
+   * Under this scenario the datanode should be marked as
+   * DECOMMISSION_IN_PROGRESS first. When pendingReplicationBlocksCount and
+   * underReplicatedBlocksCount are both 0, it becomes DECOMMISSIONED.
    */
   @Test(timeout=120000)
   public void testDecommissionDeadDN() throws Exception {
     Logger log = Logger.getLogger(DatanodeAdminManager.class);
     log.setLevel(Level.DEBUG);
-    DatanodeID dnID = cluster.getDataNodes().get(0).getDatanodeId();
-    String dnName = dnID.getXferAddr();
-    DataNodeProperties stoppedDN = cluster.stopDataNode(0);
-    DFSTestUtil.waitForDatanodeState(cluster, dnID.getDatanodeUuid(),
-        false, 30000);
+
+    DistributedFileSystem fileSystem = cluster.getFileSystem();
+
+    // Create a file with one block. That block has one replica.
+    Path f = new Path("decommission.dat");
+    DFSTestUtil.createFile(fileSystem, f, fileSize, fileSize, fileSize,
+        (short)1, seed);
+
+    // Find the DN that owns the only replica.
+    RemoteIterator<LocatedFileStatus> fileList =
+        fileSystem.listLocatedStatus(f);
+    BlockLocation[] blockLocations = fileList.next().getBlockLocations();
+    String[] dnNames = blockLocations[0].getNames();

Review comment:
       As the target DN is one host, we may not need to use String array and for-loop.
   ```java
   String dnName = blockLocations[0].getNames()[0];
   ```

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
##########
@@ -383,30 +383,70 @@ public void testDecommissionStatusAfterDNRestart() throws Exception {
 
   /**
    * Verify the support for decommissioning a datanode that is already dead.
-   * Under this scenario the datanode should immediately be marked as
-   * DECOMMISSIONED
+   * Under this scenario the datanode should be marked as
+   * DECOMMISSION_IN_PROGRESS first. When pendingReplicationBlocksCount and
+   * underReplicatedBlocksCount are both 0, it becomes DECOMMISSIONED.
    */
   @Test(timeout=120000)
   public void testDecommissionDeadDN() throws Exception {
     Logger log = Logger.getLogger(DatanodeAdminManager.class);
     log.setLevel(Level.DEBUG);
-    DatanodeID dnID = cluster.getDataNodes().get(0).getDatanodeId();
-    String dnName = dnID.getXferAddr();
-    DataNodeProperties stoppedDN = cluster.stopDataNode(0);
-    DFSTestUtil.waitForDatanodeState(cluster, dnID.getDatanodeUuid(),
-        false, 30000);
+
+    DistributedFileSystem fileSystem = cluster.getFileSystem();
+
+    // Create a file with one block. That block has one replica.
+    Path f = new Path("decommission.dat");
+    DFSTestUtil.createFile(fileSystem, f, fileSize, fileSize, fileSize,
+        (short)1, seed);
+
+    // Find the DN that owns the only replica.
+    RemoteIterator<LocatedFileStatus> fileList =
+        fileSystem.listLocatedStatus(f);
+    BlockLocation[] blockLocations = fileList.next().getBlockLocations();
+    String[] dnNames = blockLocations[0].getNames();
+
+    // Stop the DN leads to 1 block under-replicated
+    DataNodeProperties[] stoppedDNs = new DataNodeProperties[dnNames.length];
+    for (int i = 0; i < dnNames.length; i++) {
+      stoppedDNs[i] = cluster.stopDataNode(dnNames[i]);
+    }
+
     FSNamesystem fsn = cluster.getNamesystem();
     final DatanodeManager dm = fsn.getBlockManager().getDatanodeManager();
-    DatanodeDescriptor dnDescriptor = dm.getDatanode(dnID);
-    decommissionNode(dnName);
+    final List<DatanodeDescriptor> dead = new ArrayList<DatanodeDescriptor>();
+    while (true) {
+      dm.fetchDatanodes(null, dead, false);
+      if (dead.size() == 3) {

Review comment:
       Why waiting for `dead.size()==3`? They all seem to be the same host.
   
   And it would be better to use `GenericTestUtils.waitFor` instead of using the `while(true)` loop.

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
##########
@@ -453,10 +493,10 @@ public void testDecommissionLosingData() throws Exception {
     BlockManagerTestUtil.recheckDecommissionState(dm);
     // Block until the admin's monitor updates the number of tracked nodes.
     waitForDecommissionedNodes(dm.getDatanodeAdminManager(), 0);

Review comment:
       `java.util.concurrent.TimeoutException` happened in this line in my environment. Could you check it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org