You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/07/25 07:24:55 UTC

[GitHub] [hadoop-ozone] avijayanhwx opened a new pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

avijayanhwx opened a new pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258


   ## What changes were proposed in this pull request?
   Whenever there is a state change (OPEN -> CLOSING/CLOSED) while Recon is down, when it starts up again, it is not able to process reports for that container.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-4029
   
   ## How was this patch tested?
   Manually tested.
   Added unit tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#issuecomment-667749339


   +1. Thanks @avijayanhwx  for the patience. The latest LGTM, +1. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx merged pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
avijayanhwx merged pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on a change in pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#discussion_r463337964



##########
File path: hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/scm/AbstractReconContainerManagerTest.java
##########
@@ -129,4 +132,46 @@ private StorageContainerServiceProvider getScmServiceProvider()
         .thenReturn(containerWithPipeline);
     return scmServiceProviderMock;
   }
+
+  protected Table<ContainerID, ContainerInfo> getContainerTable()
+      throws IOException {
+    return CONTAINERS.getTable(store);
+  }
+
+  protected ContainerWithPipeline getTestContainer(LifeCycleState state)
+      throws IOException {
+    ContainerID containerID = new ContainerID(100L);
+    Pipeline pipeline = getRandomPipeline();

Review comment:
       Sure, will do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#discussion_r464145469



##########
File path: hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/scm/TestReconContainerManager.java
##########
@@ -86,12 +104,39 @@ public void testCheckAndAddNewContainer() throws IOException {
     ReconContainerManager containerManager = getContainerManager();
     assertFalse(containerManager.exists(containerID));
     DatanodeDetails datanodeDetails = randomDatanodeDetails();
-    containerManager.checkAndAddNewContainer(containerID, datanodeDetails);
+    containerManager.checkAndAddNewContainer(containerID,
+        OPEN, datanodeDetails);
     assertTrue(containerManager.exists(containerID));
 
     // Doing it one more time should not change any state.
-    containerManager.checkAndAddNewContainer(containerID, datanodeDetails);
+    containerManager.checkAndAddNewContainer(containerID, OPEN,
+        datanodeDetails);
     assertTrue(containerManager.exists(containerID));
+    assertEquals(LifeCycleState.OPEN,
+        getContainerManager().getContainer(containerID).getState());
   }
 
+  @Test
+  public void testUpdateContainerStateFromOpen() throws IOException {
+    ContainerWithPipeline containerWithPipeline =
+        getTestContainer(LifeCycleState.OPEN);
+
+    long id = containerWithPipeline.getContainerInfo().getContainerID();
+    ContainerID containerID =
+        containerWithPipeline.getContainerInfo().containerID();
+
+    // Adding container #100.
+    getContainerManager().addNewContainer(id, containerWithPipeline);
+    assertEquals(LifeCycleState.OPEN,
+        getContainerManager().getContainer(containerID).getState());
+
+    DatanodeDetails datanodeDetails = randomDatanodeDetails();
+
+    // First report with "CLOSED" replica state moves container state to
+    // "CLOSING".
+    getContainerManager().checkAndAddNewContainer(containerID, State.CLOSED,
+        datanodeDetails);
+    assertEquals(CLOSING,
+        getContainerManager().getContainer(containerID).getState());

Review comment:
       LGTM.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#discussion_r462484422



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerStateManager.java
##########
@@ -325,8 +325,12 @@ public void addContainerInfo(long containerID,
                                Pipeline pipeline) throws IOException {
     Preconditions.checkNotNull(containerInfo);
     containers.addContainer(containerInfo);
-    pipelineManager.addContainerToPipeline(pipeline.getId(),
-        ContainerID.valueof(containerID));
+    if (pipeline != null) {

Review comment:
       My point is that Recon listen to container report and pipeline report from DNs similar to SCM. Why do we need to rely on SCM to provide the pipeline info given Recon is a "passive SCM"? 

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -77,22 +82,42 @@ public ReconContainerManager(
    * @throws IOException on Error.
    */
   public void checkAndAddNewContainer(ContainerID containerID,
+      ContainerReplicaProto.State replicaState,
       DatanodeDetails datanodeDetails)
       throws IOException {
     if (!exists(containerID)) {
       LOG.info("New container {} got from {}.", containerID,
           datanodeDetails.getHostName());
       ContainerWithPipeline containerWithPipeline =
           scmClient.getContainerWithPipeline(containerID.getId());
-      LOG.debug("Verified new container from SCM {} ",
-          containerWithPipeline.getContainerInfo().containerID());
+      LOG.info("Verified new container from SCM {}, part of {} ",
+          containerID, containerWithPipeline.getPipeline().getId());
       // If no other client added this, go ahead and add this container.
       if (!exists(containerID)) {
         addNewContainer(containerID.getId(), containerWithPipeline);
       }
+    } else {
+      // Check if container state is not open. In SCM, container state
+      // changes to CLOSING first, and then the close command is pushed down
+      // to Datanodes. Recon 'learns' this from DN, and hence replica state
+      // will move container state to 'CLOSING'.
+      ContainerInfo containerInfo = getContainer(containerID);
+      if (containerInfo.getState().equals(HddsProtos.LifeCycleState.OPEN)
+          && !replicaState.equals(ContainerReplicaProto.State.OPEN)

Review comment:
       Thanks for the details of state transition handling on Recon side. It makes sense to me. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#discussion_r463334255



##########
File path: hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/scm/AbstractReconContainerManagerTest.java
##########
@@ -129,4 +132,46 @@ private StorageContainerServiceProvider getScmServiceProvider()
         .thenReturn(containerWithPipeline);
     return scmServiceProviderMock;
   }
+
+  protected Table<ContainerID, ContainerInfo> getContainerTable()
+      throws IOException {
+    return CONTAINERS.getTable(store);
+  }
+
+  protected ContainerWithPipeline getTestContainer(LifeCycleState state)
+      throws IOException {
+    ContainerID containerID = new ContainerID(100L);
+    Pipeline pipeline = getRandomPipeline();

Review comment:
       Can we reuse the getTestContainer below to reduce duplicated code?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx commented on pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#issuecomment-666919370


   /retest


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#discussion_r461795457



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerStateManager.java
##########
@@ -325,8 +325,12 @@ public void addContainerInfo(long containerID,
                                Pipeline pipeline) throws IOException {
     Preconditions.checkNotNull(containerInfo);
     containers.addContainer(containerInfo);
-    pipelineManager.addContainerToPipeline(pipeline.getId(),
-        ContainerID.valueof(containerID));
+    if (pipeline != null) {

Review comment:
       I'm trying to understand why this is not an issue in SCM? 
   Does Recon have similar idea like SCM safemode to ensure Pipeline meet the minimal requirement? 

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -77,22 +82,42 @@ public ReconContainerManager(
    * @throws IOException on Error.
    */
   public void checkAndAddNewContainer(ContainerID containerID,
+      ContainerReplicaProto.State replicaState,
       DatanodeDetails datanodeDetails)
       throws IOException {
     if (!exists(containerID)) {
       LOG.info("New container {} got from {}.", containerID,
           datanodeDetails.getHostName());
       ContainerWithPipeline containerWithPipeline =
           scmClient.getContainerWithPipeline(containerID.getId());
-      LOG.debug("Verified new container from SCM {} ",
-          containerWithPipeline.getContainerInfo().containerID());
+      LOG.info("Verified new container from SCM {}, part of {} ",

Review comment:
       NIT: "part of " => "pipeline"

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -77,22 +82,42 @@ public ReconContainerManager(
    * @throws IOException on Error.
    */
   public void checkAndAddNewContainer(ContainerID containerID,
+      ContainerReplicaProto.State replicaState,
       DatanodeDetails datanodeDetails)
       throws IOException {
     if (!exists(containerID)) {
       LOG.info("New container {} got from {}.", containerID,
           datanodeDetails.getHostName());
       ContainerWithPipeline containerWithPipeline =
           scmClient.getContainerWithPipeline(containerID.getId());
-      LOG.debug("Verified new container from SCM {} ",
-          containerWithPipeline.getContainerInfo().containerID());
+      LOG.info("Verified new container from SCM {}, part of {} ",
+          containerID, containerWithPipeline.getPipeline().getId());
       // If no other client added this, go ahead and add this container.
       if (!exists(containerID)) {
         addNewContainer(containerID.getId(), containerWithPipeline);
       }
+    } else {
+      // Check if container state is not open. In SCM, container state
+      // changes to CLOSING first, and then the close command is pushed down
+      // to Datanodes. Recon 'learns' this from DN, and hence replica state
+      // will move container state to 'CLOSING'.
+      ContainerInfo containerInfo = getContainer(containerID);
+      if (containerInfo.getState().equals(HddsProtos.LifeCycleState.OPEN)
+          && !replicaState.equals(ContainerReplicaProto.State.OPEN)

Review comment:
       should we more preciously check for ContainerReplicaProto.State.CLOSE state?

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -105,18 +130,32 @@ public void addNewContainer(long containerId,
     ContainerInfo containerInfo = containerWithPipeline.getContainerInfo();
     getLock().lock();
     try {
-      if (getPipelineManager().containsPipeline(
-          containerWithPipeline.getPipeline().getId())) {
-        getContainerStateManager().addContainerInfo(containerId, containerInfo,
-            getPipelineManager(), containerWithPipeline.getPipeline());
+      boolean success = false;
+      if (containerInfo.getState().equals(HddsProtos.LifeCycleState.OPEN)) {
+        PipelineID pipelineID = containerWithPipeline.getPipeline().getId();
+        if (getPipelineManager().containsPipeline(pipelineID)) {
+          getContainerStateManager().addContainerInfo(containerId,
+              containerInfo, getPipelineManager(),
+              containerWithPipeline.getPipeline());
+          success = true;
+        } else {
+          // Get open container for a pipeline that Recon does not know
+          // about yet. Cannot update internal state until pipeline is synced.
+          LOG.warn(String.format(

Review comment:
       This can be simplified as 
   LOG.warn("Pipeline {} not found. Cannot add container {}",
                 pipelineID, containerInfo.containerID());




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on a change in pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#discussion_r462733996



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerStateManager.java
##########
@@ -325,8 +325,12 @@ public void addContainerInfo(long containerID,
                                Pipeline pipeline) throws IOException {
     Preconditions.checkNotNull(containerInfo);
     containers.addContainer(containerInfo);
-    pipelineManager.addContainerToPipeline(pipeline.getId(),
-        ContainerID.valueof(containerID));
+    if (pipeline != null) {

Review comment:
       @xiaoyuyao Good question. Although we get pipeline report for a new pipeline through DN, it does not contain all the information that was used by SCM to create the pipeline (Factor, other nodes). While implementing this, we considered 2 ways to handle this, either through an elaborate pipeline creation ACK command which Recon can pick up, or through Recon requesting SCM for a confirmation whenever it sees a new pipeline. We went with the latter.  
   
   In general, any state change that "originates" in the SCM needs to be synced with Recon. But anything that originates in the Datanode works automatically. 

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerStateManager.java
##########
@@ -325,8 +325,12 @@ public void addContainerInfo(long containerID,
                                Pipeline pipeline) throws IOException {
     Preconditions.checkNotNull(containerInfo);
     containers.addContainer(containerInfo);
-    pipelineManager.addContainerToPipeline(pipeline.getId(),
-        ContainerID.valueof(containerID));
+    if (pipeline != null) {

Review comment:
       @xiaoyuyao Good question. Although we get pipeline report for a new pipeline through DN, it does not contain all the information that was used by SCM to create the pipeline (Factor, other nodes). While implementing this, we considered 2 ways to handle this, either through an elaborate pipeline creation ACK command which Recon can pick up, or through Recon requesting SCM for a confirmation whenever it sees a new pipeline. We went with the latter.  
   
   In general, any state change that "originates" in the SCM needs to be synced with Recon. But anything that originates in the Datanode works automatically. Recon is like a "read-only" follower here. In the future, the plan is to have Recon become a silent Ratis follower in the OM and SCM HA rings. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on a change in pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#discussion_r463338176



##########
File path: hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/scm/TestReconContainerManager.java
##########
@@ -86,12 +104,39 @@ public void testCheckAndAddNewContainer() throws IOException {
     ReconContainerManager containerManager = getContainerManager();
     assertFalse(containerManager.exists(containerID));
     DatanodeDetails datanodeDetails = randomDatanodeDetails();
-    containerManager.checkAndAddNewContainer(containerID, datanodeDetails);
+    containerManager.checkAndAddNewContainer(containerID,
+        OPEN, datanodeDetails);
     assertTrue(containerManager.exists(containerID));
 
     // Doing it one more time should not change any state.
-    containerManager.checkAndAddNewContainer(containerID, datanodeDetails);
+    containerManager.checkAndAddNewContainer(containerID, OPEN,
+        datanodeDetails);
     assertTrue(containerManager.exists(containerID));
+    assertEquals(LifeCycleState.OPEN,
+        getContainerManager().getContainer(containerID).getState());
   }
 
+  @Test
+  public void testUpdateContainerStateFromOpen() throws IOException {
+    ContainerWithPipeline containerWithPipeline =
+        getTestContainer(LifeCycleState.OPEN);
+
+    long id = containerWithPipeline.getContainerInfo().getContainerID();
+    ContainerID containerID =
+        containerWithPipeline.getContainerInfo().containerID();
+
+    // Adding container #100.
+    getContainerManager().addNewContainer(id, containerWithPipeline);
+    assertEquals(LifeCycleState.OPEN,
+        getContainerManager().getContainer(containerID).getState());
+
+    DatanodeDetails datanodeDetails = randomDatanodeDetails();
+
+    // First report with "CLOSED" replica state moves container state to
+    // "CLOSING".
+    getContainerManager().checkAndAddNewContainer(containerID, State.CLOSED,
+        datanodeDetails);
+    assertEquals(CLOSING,
+        getContainerManager().getContainer(containerID).getState());

Review comment:
       @xiaoyuyao I have added the tests for all the states in TestReconICR. Is that OK? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#discussion_r463335640



##########
File path: hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/scm/TestReconContainerManager.java
##########
@@ -86,12 +104,39 @@ public void testCheckAndAddNewContainer() throws IOException {
     ReconContainerManager containerManager = getContainerManager();
     assertFalse(containerManager.exists(containerID));
     DatanodeDetails datanodeDetails = randomDatanodeDetails();
-    containerManager.checkAndAddNewContainer(containerID, datanodeDetails);
+    containerManager.checkAndAddNewContainer(containerID,
+        OPEN, datanodeDetails);
     assertTrue(containerManager.exists(containerID));
 
     // Doing it one more time should not change any state.
-    containerManager.checkAndAddNewContainer(containerID, datanodeDetails);
+    containerManager.checkAndAddNewContainer(containerID, OPEN,
+        datanodeDetails);
     assertTrue(containerManager.exists(containerID));
+    assertEquals(LifeCycleState.OPEN,
+        getContainerManager().getContainer(containerID).getState());
   }
 
+  @Test
+  public void testUpdateContainerStateFromOpen() throws IOException {
+    ContainerWithPipeline containerWithPipeline =
+        getTestContainer(LifeCycleState.OPEN);
+
+    long id = containerWithPipeline.getContainerInfo().getContainerID();
+    ContainerID containerID =
+        containerWithPipeline.getContainerInfo().containerID();
+
+    // Adding container #100.
+    getContainerManager().addNewContainer(id, containerWithPipeline);
+    assertEquals(LifeCycleState.OPEN,
+        getContainerManager().getContainer(containerID).getState());
+
+    DatanodeDetails datanodeDetails = randomDatanodeDetails();
+
+    // First report with "CLOSED" replica state moves container state to
+    // "CLOSING".
+    getContainerManager().checkAndAddNewContainer(containerID, State.CLOSED,
+        datanodeDetails);
+    assertEquals(CLOSING,
+        getContainerManager().getContainer(containerID).getState());

Review comment:
       Can we add an additional case from CLOSING to QUASI_CLOSED upon receiving State.QUASI_CLOSED?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] github-actions[bot] commented on pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#issuecomment-666919507


   To re-run CI checks, please follow these steps with the source branch checked out:
   ```
   git commit --allow-empty -m 'trigger new CI check'
   git push
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx commented on pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#issuecomment-668113598


   Thank you for the review @xiaoyuyao.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] avijayanhwx commented on a change in pull request #1258: HDDS-4029. Recon unable to add a new container which is in CLOSED state.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on a change in pull request #1258:
URL: https://github.com/apache/hadoop-ozone/pull/1258#discussion_r462320563



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerStateManager.java
##########
@@ -325,8 +325,12 @@ public void addContainerInfo(long containerID,
                                Pipeline pipeline) throws IOException {
     Preconditions.checkNotNull(containerInfo);
     containers.addContainer(containerInfo);
-    pipelineManager.addContainerToPipeline(pipeline.getId(),
-        ContainerID.valueof(containerID));
+    if (pipeline != null) {

Review comment:
       @xiaoyuyao This is a specific use case to Recon where it may be directly trying to add a new container to its "passive SCM" metadata. For example, Recon may have been down during an extended period of time, when a container was created, opened and then eventually closed by SCM. At that point in time, if Recon is started up, it receives container reports for that container, it verifies that it was created by SCM, along with the pipeline information. If it is a closed container, then the pipeline ID returned by SCM is spurious. All that matters is the datanode which hosts the replica. 
   
   In SCM this is not possible since it already has the container metadata, and the 'addContainerInfo' method is called during "createContainer" step.  

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -105,18 +130,32 @@ public void addNewContainer(long containerId,
     ContainerInfo containerInfo = containerWithPipeline.getContainerInfo();
     getLock().lock();
     try {
-      if (getPipelineManager().containsPipeline(
-          containerWithPipeline.getPipeline().getId())) {
-        getContainerStateManager().addContainerInfo(containerId, containerInfo,
-            getPipelineManager(), containerWithPipeline.getPipeline());
+      boolean success = false;
+      if (containerInfo.getState().equals(HddsProtos.LifeCycleState.OPEN)) {
+        PipelineID pipelineID = containerWithPipeline.getPipeline().getId();
+        if (getPipelineManager().containsPipeline(pipelineID)) {
+          getContainerStateManager().addContainerInfo(containerId,
+              containerInfo, getPipelineManager(),
+              containerWithPipeline.getPipeline());
+          success = true;
+        } else {
+          // Get open container for a pipeline that Recon does not know
+          // about yet. Cannot update internal state until pipeline is synced.
+          LOG.warn(String.format(

Review comment:
       Sure, will do. 

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -77,22 +82,42 @@ public ReconContainerManager(
    * @throws IOException on Error.
    */
   public void checkAndAddNewContainer(ContainerID containerID,
+      ContainerReplicaProto.State replicaState,
       DatanodeDetails datanodeDetails)
       throws IOException {
     if (!exists(containerID)) {
       LOG.info("New container {} got from {}.", containerID,
           datanodeDetails.getHostName());
       ContainerWithPipeline containerWithPipeline =
           scmClient.getContainerWithPipeline(containerID.getId());
-      LOG.debug("Verified new container from SCM {} ",
-          containerWithPipeline.getContainerInfo().containerID());
+      LOG.info("Verified new container from SCM {}, part of {} ",
+          containerID, containerWithPipeline.getPipeline().getId());
       // If no other client added this, go ahead and add this container.
       if (!exists(containerID)) {
         addNewContainer(containerID.getId(), containerWithPipeline);
       }
+    } else {
+      // Check if container state is not open. In SCM, container state
+      // changes to CLOSING first, and then the close command is pushed down
+      // to Datanodes. Recon 'learns' this from DN, and hence replica state
+      // will move container state to 'CLOSING'.
+      ContainerInfo containerInfo = getContainer(containerID);
+      if (containerInfo.getState().equals(HddsProtos.LifeCycleState.OPEN)
+          && !replicaState.equals(ContainerReplicaProto.State.OPEN)

Review comment:
       @xiaoyuyao The container could be in any non-open & healthy state (CLOSING, QUASI_CLOSED, CLOSED). When the container was OPEN Recon may have added it to its metadata, but then it may have gone down. When it comes back up, the container could be in any state. A unit test (testProcessICRStateMismatch ) has been added for all the states.

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -77,22 +82,42 @@ public ReconContainerManager(
    * @throws IOException on Error.
    */
   public void checkAndAddNewContainer(ContainerID containerID,
+      ContainerReplicaProto.State replicaState,
       DatanodeDetails datanodeDetails)
       throws IOException {
     if (!exists(containerID)) {
       LOG.info("New container {} got from {}.", containerID,
           datanodeDetails.getHostName());
       ContainerWithPipeline containerWithPipeline =
           scmClient.getContainerWithPipeline(containerID.getId());
-      LOG.debug("Verified new container from SCM {} ",
-          containerWithPipeline.getContainerInfo().containerID());
+      LOG.info("Verified new container from SCM {}, part of {} ",
+          containerID, containerWithPipeline.getPipeline().getId());
       // If no other client added this, go ahead and add this container.
       if (!exists(containerID)) {
         addNewContainer(containerID.getId(), containerWithPipeline);
       }
+    } else {
+      // Check if container state is not open. In SCM, container state
+      // changes to CLOSING first, and then the close command is pushed down
+      // to Datanodes. Recon 'learns' this from DN, and hence replica state
+      // will move container state to 'CLOSING'.
+      ContainerInfo containerInfo = getContainer(containerID);
+      if (containerInfo.getState().equals(HddsProtos.LifeCycleState.OPEN)
+          && !replicaState.equals(ContainerReplicaProto.State.OPEN)

Review comment:
       A sample flow of events.
   
   - Recon has containerState = OPEN, Recon gets container report with replicaState = QUASI_CLOSED
   - Before processing the replica, it calls on the above logic (ReconContainerManager#checkAndAddNewContainer) which updates containerState to ‘CLOSING’ through the "FINALIZE" event. This is like a pre-processing step for Recon where the container replica state is showing up as unexpected. We have to do this since SCM is where the state transition was initiated, but Recon has to "learn" it later from DNs. 
   - Further along in the same process container report flow, there is a process replica step which sees that the containerState = ‘CLOSING’, and replicaState = QUASI_CLOSED, hence moves container state to ‘QUASI_CLOSED’ (AbstractContainerReportHandler#updateContainerState). This is the same code as used by SCM to transition container state by looking at the replica state.
   
   The same sequence of events would be true if starting replicaState = “CLOSING" or "CLOSED” as well




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org