You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "adoroszlai (via GitHub)" <gi...@apache.org> on 2023/06/22 07:40:14 UTC

[GitHub] [ozone] adoroszlai opened a new pull request, #4955: HDDS-8897. Avoid synchronization on WritableECContainerProvider

adoroszlai opened a new pull request, #4955:
URL: https://github.com/apache/ozone/pull/4955

   ## What changes were proposed in this pull request?
   
   Keep track of pending EC pipeline allocations in `WritableECContainerProvider` to avoid the need for synchronization.  Only force wait for over-the-limit requests.
   
   https://issues.apache.org/jira/browse/HDDS-8897
   
   ## How was this patch tested?
   
   https://github.com/adoroszlai/hadoop-ozone/actions/runs/5342367241


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #4955: HDDS-8897. Avoid synchronization on WritableECContainerProvider

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #4955:
URL: https://github.com/apache/ozone/pull/4955#discussion_r1238690687


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/WritableECContainerProvider.java:
##########
@@ -150,21 +153,70 @@ public ContainerInfo getContainer(final long size,
         }
       }
     }
+
     // If we get here, all the pipelines we tried were no good. So try to
     // allocate a new one.
+    container = allocateContainerIfWithinLimit(
+        maximumPipelines, openPipelineCount, true,
+        repConfig, size, owner, excludeList);
+
+    if (container != null) {
+      return container;
+    }
+
+    String msg = "Unable to allocate a pipeline for " + repConfig + ":"
+        + " the maximum of " + maximumPipelines + " has been reached";
+    if (openPipelineCount > 0) {
+      msg += ", and none of the " + openPipelineCount
+          + " existing ones are suitable";
+    }
+
+    throw new IOException(msg);
+  }
+
+  @Nullable
+  private ContainerInfo allocateContainerIfWithinLimit(
+      int max, int current, boolean finalAttempt,
+      ECReplicationConfig repConfig, long size, String owner,
+      ExcludeList excludeList) throws IOException, TimeoutException {
+
+    final String msg = "Unable to allocate a container for {} as {} existing "
+        + "containers and {} pending allocations have reached the limit of {}";
+
+    final int pending = pendingAllocations.getAndIncrement();
     try {
-      synchronized (this) {
-        if (openPipelineCount < maximumPipelines) {
-          return allocateContainer(repConfig, size, owner, excludeList);
+      if (current + pending < max) {
+        ContainerInfo containerInfo =
+            allocateContainer(repConfig, size, owner, excludeList);
+        allocation.signal();

Review Comment:
   Is there a potential race condition here?
   
   Say we are 1 off the max, and 1 pipline is pending create.
   
   Another thread falls into the "else if". Just before it calls `allocation.await()` the previous allocation completes and calls allocation.signal().
   
   Then the "other thread" will get blocked on allocation.await() with nothing to wake it up?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #4955: HDDS-8897. Avoid synchronization on WritableECContainerProvider

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on code in PR #4955:
URL: https://github.com/apache/ozone/pull/4955#discussion_r1238771240


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/WritableECContainerProvider.java:
##########
@@ -150,21 +153,70 @@ public ContainerInfo getContainer(final long size,
         }
       }
     }
+
     // If we get here, all the pipelines we tried were no good. So try to
     // allocate a new one.
+    container = allocateContainerIfWithinLimit(
+        maximumPipelines, openPipelineCount, true,
+        repConfig, size, owner, excludeList);
+
+    if (container != null) {
+      return container;
+    }
+
+    String msg = "Unable to allocate a pipeline for " + repConfig + ":"
+        + " the maximum of " + maximumPipelines + " has been reached";
+    if (openPipelineCount > 0) {
+      msg += ", and none of the " + openPipelineCount
+          + " existing ones are suitable";
+    }
+
+    throw new IOException(msg);
+  }
+
+  @Nullable
+  private ContainerInfo allocateContainerIfWithinLimit(
+      int max, int current, boolean finalAttempt,
+      ECReplicationConfig repConfig, long size, String owner,
+      ExcludeList excludeList) throws IOException, TimeoutException {
+
+    final String msg = "Unable to allocate a container for {} as {} existing "
+        + "containers and {} pending allocations have reached the limit of {}";
+
+    final int pending = pendingAllocations.getAndIncrement();
     try {
-      synchronized (this) {
-        if (openPipelineCount < maximumPipelines) {
-          return allocateContainer(repConfig, size, owner, excludeList);
+      if (current + pending < max) {
+        ContainerInfo containerInfo =
+            allocateContainer(repConfig, size, owner, excludeList);
+        allocation.signal();

Review Comment:
   You are right.  It would be woken up by the next future allocation, though.  We may change to `await(time)` to limit that period.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


Re: [PR] HDDS-8897. Avoid synchronization on WritableECContainerProvider [ozone]

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai closed pull request #4955: HDDS-8897. Avoid synchronization on WritableECContainerProvider
URL: https://github.com/apache/ozone/pull/4955


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #4955: HDDS-8897. Avoid synchronization on WritableECContainerProvider

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #4955:
URL: https://github.com/apache/ozone/pull/4955#issuecomment-1602259584

   We need to see a profile of the original code and the changed code to see exactly where the problem lies. There is synchronization inside pipeline and container manager too, so the changes here might not help much, as we are also not certain about where the problem is, as there are several synchronized blocks in the original code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on pull request #4955: HDDS-8897. Avoid synchronization on WritableECContainerProvider

Posted by "adoroszlai (via GitHub)" <gi...@apache.org>.
adoroszlai commented on PR #4955:
URL: https://github.com/apache/ozone/pull/4955#issuecomment-1602165033

   @guohao-rosicky please check if this helps improve latency


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #4955: HDDS-8897. Avoid synchronization on WritableECContainerProvider

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on PR #4955:
URL: https://github.com/apache/ozone/pull/4955#issuecomment-1605995936

   @guohao-rosicky We really need to profile the code to see the hot spots before jumping into solutions. As it stands we don't know:
   
   1. If the problem is too few pipelines - see my comment on the Jira about this.
   2. Around pipelines creation time (which could be made worse by too few pipelines)
   3. Selecting pipelines from the open set 
   
   If you could reproduce your test with the async profiler attached to SCM, it would be a great help. If it difficult to profile this in a simulation, as the need to open pipelines is driven by pipelines filling and closing, so it really needs a workload on a real cluster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guohao-rosicky commented on pull request #4955: HDDS-8897. Avoid synchronization on WritableECContainerProvider

Posted by "guohao-rosicky (via GitHub)" <gi...@apache.org>.
guohao-rosicky commented on PR #4955:
URL: https://github.com/apache/ozone/pull/4955#issuecomment-1605947570

   @adoroszlai
   I have an idea to asynchronously process the close pipeline and pre-creating the container, so that the lock time will be much shorter.
   
   I now have a version of the initial implementation of the code, whether to submit a pr, according to the code we will discuss
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guohao-rosicky commented on pull request #4955: HDDS-8897. Avoid synchronization on WritableECContainerProvider

Posted by "guohao-rosicky (via GitHub)" <gi...@apache.org>.
guohao-rosicky commented on PR #4955:
URL: https://github.com/apache/ozone/pull/4955#issuecomment-1605946019

   hi @adoroszlai 
   I am also following this issue, this lock contention will degrade the performance of allocate block, detailed report I have submitted to jira. 
   please see: https://issues.apache.org/jira/browse/HDDS-8897
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org