You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/24 07:55:55 UTC

[GitHub] [ozone] bharatviswa504 opened a new pull request #2569: HDDS-5655. SCM terminates when allocatecontainer happens on closed pipeline.

bharatviswa504 opened a new pull request #2569:
URL: https://github.com/apache/ozone/pull/2569


   ## What changes were proposed in this pull request?
   
   Acquire pipelineManager during allocateContainer to avoid any updates to pipelineState.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5655
   
   ## How was this patch tested?
   
   Existing tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2569: HDDS-5655. SCM terminates when allocatecontainer happens on closed pipeline.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2569:
URL: https://github.com/apache/ozone/pull/2569#issuecomment-905623623


   Thank You @bshashikant and @JacksonYao287 for the review.
   I have verified failed test passed locally.
   For TestOzoneConfigurationFields I have opened Jira to fix HDDS-5668. Fix TestOzoneConfigurationFields. #2576.
   And there is a Jira opened for TestSCMInstallSnapshot Jira to fix flakyness (https://issues.apache.org/jira/browse/HDDS-5631 )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2569: HDDS-5655. SCM terminates when allocatecontainer happens on closed pipeline.

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2569:
URL: https://github.com/apache/ozone/pull/2569#discussion_r695588583



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineManagerImpl.java
##########
@@ -69,7 +68,7 @@
       LoggerFactory.getLogger(PipelineManagerImpl.class);
 
   // Limit the number of on-going ratis operation to be 1.
-  private final Lock lock;
+  private final ReentrantReadWriteLock lock;

Review comment:
       `PipelineStateManagerV2Impl`  has an internal lock too ,  write options , such as `removePipeline` ,will be protected by this lock.
   ```
     public void removePipeline(HddsProtos.PipelineID pipelineIDProto)
         throws IOException {
       lock.writeLock().lock();
       try {
         .............
       } finally {
         lock.writeLock().unlock();
       }
     }
   ```
   
   can we take it into to account?
   since `PipelineStateManagerV2Impl` is a private member of `PipelineManagerImpl`,  `PipelineStateManagerV2Impl` can be totally protected by `PipelineManagerImpl`. if we add a lock to `PipelineManagerImpl`, can we remove the internal lock of `PipelineStateManagerV2Impl`?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManagerImpl.java
##########
@@ -176,33 +176,64 @@ public ContainerInfo getContainer(final ContainerID id)
   public ContainerInfo allocateContainer(
       final ReplicationConfig replicationConfig, final String owner)
       throws IOException {
+    // Acquire pipeline manager lock, to avoid any updates to pipeline
+    // while allocate container happens. This is to avoid scenario like
+    // mentioned in HDDS-5655.
+    pipelineManager.acquireReadLock();
     lock.lock();
+    List<Pipeline> pipelines;
+    Pipeline pipeline;
+    ContainerInfo containerInfo = null;
     try {
-      final List<Pipeline> pipelines = pipelineManager
+      pipelines = pipelineManager
           .getPipelines(replicationConfig, Pipeline.PipelineState.OPEN);
+      if (!pipelines.isEmpty()) {
+        pipeline = pipelines.get(random.nextInt(pipelines.size()));
+        containerInfo = createContainer(pipeline, owner);
+      }
+    } finally {
+      lock.unlock();
+      pipelineManager.releaseReadLock();
+    }
 
-      final Pipeline pipeline;
-      if (pipelines.isEmpty()) {
-        try {
-          pipeline = pipelineManager.createPipeline(replicationConfig);
-          pipelineManager.waitPipelineReady(pipeline.getId(), 0);
-        } catch (IOException e) {
-          scmContainerManagerMetrics.incNumFailureCreateContainers();
+    if (pipelines.isEmpty()) {
+      try {
+        pipeline = pipelineManager.createPipeline(replicationConfig);
+        pipelineManager.waitPipelineReady(pipeline.getId(), 0);
+      } catch (IOException e) {
+        scmContainerManagerMetrics.incNumFailureCreateContainers();
+        throw new IOException("Could not allocate container. Cannot get any" +
+            " matching pipeline for replicationConfig: " + replicationConfig
+            + ", State:PipelineState.OPEN", e);
+      }
+      pipelineManager.acquireReadLock();
+      lock.lock();
+      try {
+        pipelines = pipelineManager
+            .getPipelines(replicationConfig, Pipeline.PipelineState.OPEN);
+        if (!pipelines.isEmpty()) {
+          pipeline = pipelines.get(random.nextInt(pipelines.size()));
+          containerInfo = createContainer(pipeline, owner);
+        } else {
           throw new IOException("Could not allocate container. Cannot get any" +
               " matching pipeline for replicationConfig: " + replicationConfig
-              + ", State:PipelineState.OPEN", e);
+              + ", State:PipelineState.OPEN");
         }
-      } else {
-        pipeline = pipelines.get(random.nextInt(pipelines.size()));
-      }
-      final ContainerInfo containerInfo = allocateContainer(pipeline, owner);
-      if (LOG.isTraceEnabled()) {
-        LOG.trace("New container allocated: {}", containerInfo);
+      } finally {
+        lock.unlock();
+        pipelineManager.releaseReadLock();

Review comment:
       can you refactor these code to some like 
   ```
   pipelineManager.acquireReadLock();
   lock.lock();
   try {
         pipelines = pipelineManager
             .getPipelines(replicationConfig, Pipeline.PipelineState.OPEN);
         if (pipelines.isEmpty()){
            .............
         }else{
            ............
         }
   } finally{
           lock.unlock();
           pipelineManager.releaseReadLock();
   }
   ```
   i think this will remove some redundant code and the logic here will be more clear




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 merged pull request #2569: HDDS-5655. SCM terminates when allocatecontainer happens on closed pipeline.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 merged pull request #2569:
URL: https://github.com/apache/ozone/pull/2569


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2569: HDDS-5655. SCM terminates when allocatecontainer happens on closed pipeline.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2569:
URL: https://github.com/apache/ozone/pull/2569#discussion_r695603286



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManagerImpl.java
##########
@@ -176,33 +176,64 @@ public ContainerInfo getContainer(final ContainerID id)
   public ContainerInfo allocateContainer(
       final ReplicationConfig replicationConfig, final String owner)
       throws IOException {
+    // Acquire pipeline manager lock, to avoid any updates to pipeline
+    // while allocate container happens. This is to avoid scenario like
+    // mentioned in HDDS-5655.
+    pipelineManager.acquireReadLock();
     lock.lock();
+    List<Pipeline> pipelines;
+    Pipeline pipeline;
+    ContainerInfo containerInfo = null;
     try {
-      final List<Pipeline> pipelines = pipelineManager
+      pipelines = pipelineManager
           .getPipelines(replicationConfig, Pipeline.PipelineState.OPEN);
+      if (!pipelines.isEmpty()) {
+        pipeline = pipelines.get(random.nextInt(pipelines.size()));
+        containerInfo = createContainer(pipeline, owner);
+      }
+    } finally {
+      lock.unlock();
+      pipelineManager.releaseReadLock();
+    }
 
-      final Pipeline pipeline;
-      if (pipelines.isEmpty()) {
-        try {
-          pipeline = pipelineManager.createPipeline(replicationConfig);
-          pipelineManager.waitPipelineReady(pipeline.getId(), 0);
-        } catch (IOException e) {
-          scmContainerManagerMetrics.incNumFailureCreateContainers();
+    if (pipelines.isEmpty()) {
+      try {
+        pipeline = pipelineManager.createPipeline(replicationConfig);
+        pipelineManager.waitPipelineReady(pipeline.getId(), 0);
+      } catch (IOException e) {
+        scmContainerManagerMetrics.incNumFailureCreateContainers();
+        throw new IOException("Could not allocate container. Cannot get any" +
+            " matching pipeline for replicationConfig: " + replicationConfig
+            + ", State:PipelineState.OPEN", e);
+      }
+      pipelineManager.acquireReadLock();
+      lock.lock();
+      try {
+        pipelines = pipelineManager
+            .getPipelines(replicationConfig, Pipeline.PipelineState.OPEN);
+        if (!pipelines.isEmpty()) {
+          pipeline = pipelines.get(random.nextInt(pipelines.size()));
+          containerInfo = createContainer(pipeline, owner);
+        } else {
           throw new IOException("Could not allocate container. Cannot get any" +
               " matching pipeline for replicationConfig: " + replicationConfig
-              + ", State:PipelineState.OPEN", e);
+              + ", State:PipelineState.OPEN");
         }
-      } else {
-        pipeline = pipelines.get(random.nextInt(pipelines.size()));
-      }
-      final ContainerInfo containerInfo = allocateContainer(pipeline, owner);
-      if (LOG.isTraceEnabled()) {
-        LOG.trace("New container allocated: {}", containerInfo);
+      } finally {
+        lock.unlock();
+        pipelineManager.releaseReadLock();

Review comment:
       We cannot do that, that is the reason for releasing lock and reacquire.
   As if pipeline list is empty, we create pipeline and wait for ready. Here if we acquire pipeline lock, When DN reports the pipeline report handler needs pipelinemanager lock to update pipelinestate. So, ReportPipeline handler cannot update pipeline state and blockmanager allocate block will wait for pipeline ready, this is like deadlock. to avoid this have taken this approach
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2569: HDDS-5655. SCM terminates when allocatecontainer happens on closed pipeline.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2569:
URL: https://github.com/apache/ozone/pull/2569#discussion_r695603286



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManagerImpl.java
##########
@@ -176,33 +176,64 @@ public ContainerInfo getContainer(final ContainerID id)
   public ContainerInfo allocateContainer(
       final ReplicationConfig replicationConfig, final String owner)
       throws IOException {
+    // Acquire pipeline manager lock, to avoid any updates to pipeline
+    // while allocate container happens. This is to avoid scenario like
+    // mentioned in HDDS-5655.
+    pipelineManager.acquireReadLock();
     lock.lock();
+    List<Pipeline> pipelines;
+    Pipeline pipeline;
+    ContainerInfo containerInfo = null;
     try {
-      final List<Pipeline> pipelines = pipelineManager
+      pipelines = pipelineManager
           .getPipelines(replicationConfig, Pipeline.PipelineState.OPEN);
+      if (!pipelines.isEmpty()) {
+        pipeline = pipelines.get(random.nextInt(pipelines.size()));
+        containerInfo = createContainer(pipeline, owner);
+      }
+    } finally {
+      lock.unlock();
+      pipelineManager.releaseReadLock();
+    }
 
-      final Pipeline pipeline;
-      if (pipelines.isEmpty()) {
-        try {
-          pipeline = pipelineManager.createPipeline(replicationConfig);
-          pipelineManager.waitPipelineReady(pipeline.getId(), 0);
-        } catch (IOException e) {
-          scmContainerManagerMetrics.incNumFailureCreateContainers();
+    if (pipelines.isEmpty()) {
+      try {
+        pipeline = pipelineManager.createPipeline(replicationConfig);
+        pipelineManager.waitPipelineReady(pipeline.getId(), 0);
+      } catch (IOException e) {
+        scmContainerManagerMetrics.incNumFailureCreateContainers();
+        throw new IOException("Could not allocate container. Cannot get any" +
+            " matching pipeline for replicationConfig: " + replicationConfig
+            + ", State:PipelineState.OPEN", e);
+      }
+      pipelineManager.acquireReadLock();
+      lock.lock();
+      try {
+        pipelines = pipelineManager
+            .getPipelines(replicationConfig, Pipeline.PipelineState.OPEN);
+        if (!pipelines.isEmpty()) {
+          pipeline = pipelines.get(random.nextInt(pipelines.size()));
+          containerInfo = createContainer(pipeline, owner);
+        } else {
           throw new IOException("Could not allocate container. Cannot get any" +
               " matching pipeline for replicationConfig: " + replicationConfig
-              + ", State:PipelineState.OPEN", e);
+              + ", State:PipelineState.OPEN");
         }
-      } else {
-        pipeline = pipelines.get(random.nextInt(pipelines.size()));
-      }
-      final ContainerInfo containerInfo = allocateContainer(pipeline, owner);
-      if (LOG.isTraceEnabled()) {
-        LOG.trace("New container allocated: {}", containerInfo);
+      } finally {
+        lock.unlock();
+        pipelineManager.releaseReadLock();

Review comment:
       We cannot do that, that is the reason for releasing lock and reacquire.
   As if pipeline list is empty, we create pipeline and wait for ready. Here if we acquire pipeline lock, When DN reports the pipeline report handler needs pipelinemanager lock to update pipelinestate.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2569: HDDS-5655. SCM terminates when allocatecontainer happens on closed pipeline.

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2569:
URL: https://github.com/apache/ozone/pull/2569#discussion_r695736071



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineManagerImpl.java
##########
@@ -69,7 +68,7 @@
       LoggerFactory.getLogger(PipelineManagerImpl.class);
 
   // Limit the number of on-going ratis operation to be 1.
-  private final Lock lock;
+  private final ReentrantReadWriteLock lock;

Review comment:
       > Can we do that in separate jira if we can remove it. I can open a new jira for that.
   sure, agree
   

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineManagerImpl.java
##########
@@ -69,7 +68,7 @@
       LoggerFactory.getLogger(PipelineManagerImpl.class);
 
   // Limit the number of on-going ratis operation to be 1.
-  private final Lock lock;
+  private final ReentrantReadWriteLock lock;

Review comment:
       > Can we do that in separate jira if we can remove it. I can open a new jira for that.
   
   sure, agree
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2569: HDDS-5655. SCM terminates when allocatecontainer happens on closed pipeline.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2569:
URL: https://github.com/apache/ozone/pull/2569#discussion_r695604756



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineManagerImpl.java
##########
@@ -69,7 +68,7 @@
       LoggerFactory.getLogger(PipelineManagerImpl.class);
 
   // Limit the number of on-going ratis operation to be 1.
-  private final Lock lock;
+  private final ReentrantReadWriteLock lock;

Review comment:
       Can we do that in separate jira if we can remove it. I can open a new jira for that.
   For this jira, lets stick with fixing problem.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2569: HDDS-5655. SCM terminates when allocatecontainer happens on closed pipeline.

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2569:
URL: https://github.com/apache/ozone/pull/2569#discussion_r695742123



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManagerImpl.java
##########
@@ -176,33 +176,64 @@ public ContainerInfo getContainer(final ContainerID id)
   public ContainerInfo allocateContainer(
       final ReplicationConfig replicationConfig, final String owner)
       throws IOException {
+    // Acquire pipeline manager lock, to avoid any updates to pipeline
+    // while allocate container happens. This is to avoid scenario like
+    // mentioned in HDDS-5655.
+    pipelineManager.acquireReadLock();
     lock.lock();
+    List<Pipeline> pipelines;
+    Pipeline pipeline;
+    ContainerInfo containerInfo = null;
     try {
-      final List<Pipeline> pipelines = pipelineManager
+      pipelines = pipelineManager
           .getPipelines(replicationConfig, Pipeline.PipelineState.OPEN);
+      if (!pipelines.isEmpty()) {
+        pipeline = pipelines.get(random.nextInt(pipelines.size()));
+        containerInfo = createContainer(pipeline, owner);
+      }
+    } finally {
+      lock.unlock();
+      pipelineManager.releaseReadLock();
+    }
 
-      final Pipeline pipeline;
-      if (pipelines.isEmpty()) {
-        try {
-          pipeline = pipelineManager.createPipeline(replicationConfig);
-          pipelineManager.waitPipelineReady(pipeline.getId(), 0);
-        } catch (IOException e) {
-          scmContainerManagerMetrics.incNumFailureCreateContainers();
+    if (pipelines.isEmpty()) {
+      try {
+        pipeline = pipelineManager.createPipeline(replicationConfig);
+        pipelineManager.waitPipelineReady(pipeline.getId(), 0);
+      } catch (IOException e) {
+        scmContainerManagerMetrics.incNumFailureCreateContainers();
+        throw new IOException("Could not allocate container. Cannot get any" +
+            " matching pipeline for replicationConfig: " + replicationConfig
+            + ", State:PipelineState.OPEN", e);
+      }
+      pipelineManager.acquireReadLock();
+      lock.lock();
+      try {
+        pipelines = pipelineManager
+            .getPipelines(replicationConfig, Pipeline.PipelineState.OPEN);
+        if (!pipelines.isEmpty()) {
+          pipeline = pipelines.get(random.nextInt(pipelines.size()));
+          containerInfo = createContainer(pipeline, owner);
+        } else {
           throw new IOException("Could not allocate container. Cannot get any" +
               " matching pipeline for replicationConfig: " + replicationConfig
-              + ", State:PipelineState.OPEN", e);
+              + ", State:PipelineState.OPEN");
         }
-      } else {
-        pipeline = pipelines.get(random.nextInt(pipelines.size()));
-      }
-      final ContainerInfo containerInfo = allocateContainer(pipeline, owner);
-      if (LOG.isTraceEnabled()) {
-        LOG.trace("New container allocated: {}", containerInfo);
+      } finally {
+        lock.unlock();
+        pipelineManager.releaseReadLock();

Review comment:
       yea , make sense , please add this explanation as comments into the code 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 closed pull request #2569: HDDS-5655. SCM terminates when allocatecontainer happens on closed pipeline.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 closed pull request #2569:
URL: https://github.com/apache/ozone/pull/2569


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org