You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "homatthew (via GitHub)" <gi...@apache.org> on 2023/05/08 22:29:24 UTC

[GitHub] [gobblin] homatthew commented on a diff in pull request #3692: [GOBBLIN-1823] Improving Container Calculation and Allocation Methodology

homatthew commented on code in PR #3692:
URL: https://github.com/apache/gobblin/pull/3692#discussion_r1187743610


##########
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java:
##########
@@ -501,12 +514,30 @@ public synchronized boolean requestTargetNumberOfContainers(YarnContainerRequest
       }
     }
 
+    //We go through all the containers we have now and check whether the assigned participant is still alive, if not, we should put them in idle container Map

Review Comment:
   "Check whether assigned participant is still alive". There is nothing here to suggest that these instances are actually "assigned" anything. 
   
   I think a more accurate comment is instead something like:
   > iterate through all containers allocated and check whether the corresponding helix instance is still LIVE within the helix cluster. A container that has a bad connection to zookeeper will be dropped from the Helix cluster if the disconnection is greater than the specified timeout. In these cases, we want to release the container to get a new container because these containers won't be assigned tasks by Helix



##########
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java:
##########
@@ -501,12 +514,30 @@ public synchronized boolean requestTargetNumberOfContainers(YarnContainerRequest
       }
     }
 
+    //We go through all the containers we have now and check whether the assigned participant is still alive, if not, we should put them in idle container Map
+    //And we will release the container if the assigned participant still offline after a given time
+
+    List<Container> containersToRelease = new ArrayList<>();
+    for (Map.Entry<ContainerId, ContainerInfo> entry : this.containerMap.entrySet()) {
+      ContainerInfo containerInfo = entry.getValue();
+      if (!HelixUtils.isInstanceLive(helixManager, containerInfo.getHelixParticipantId())) {
+        containerIdleSince.putIfAbsent(entry.getKey(), System.currentTimeMillis());
+        if (System.currentTimeMillis() - containerIdleSince.get(entry.getKey())
+            >= TimeUnit.MINUTES.toMillis(YarnAutoScalingManager.DEFAULT_MAX_CONTAINER_IDLE_TIME_BEFORE_SCALING_DOWN_MINUTES)) {
+          LOGGER.info("Releasing Container {} because the assigned participant {} has been in-active for more than {} minutes",
+              entry.getKey(), containerInfo.getHelixParticipantId(), YarnAutoScalingManager.DEFAULT_MAX_CONTAINER_IDLE_TIME_BEFORE_SCALING_DOWN_MINUTES);
+          containersToRelease.add(containerInfo.getContainer());
+        }
+      } else {
+        containerIdleSince.remove(entry.getKey());
+      }
+    }
+
     // If the total desired is lower than the currently allocated amount then release free containers.
     // This is based on the currently allocated amount since containers may still be in the process of being allocated
     // and assigned work. Resizing based on numRequestedContainers at this point may release a container right before
     // or soon after it is assigned work.
-    if (numTargetContainers < totalAllocatedContainers) {
-      List<Container> containersToRelease = new ArrayList<>();
+    if (containersToRelease.isEmpty() && numTargetContainers < totalAllocatedContainers) {

Review Comment:
   Why do we entirely skip this block if there are already containers to release? Hopefully the previous block doesn't happen that often, but this if statement is still a bit strange to read.
   
   To me, this should instead be:
   ```
   if (numTargetContainers < totalAllocatedContainers - containersToRelease.size())
   ```



##########
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java:
##########
@@ -473,6 +475,17 @@ public synchronized boolean requestTargetNumberOfContainers(YarnContainerRequest
       return false;
     }
 
+    //Correct the containerMap first as there is cases that handleContainerCompletion() is called before onContainersAllocated()
+    for (ContainerId removedId :this.removedContainerID.keySet()) {

Review Comment:
   nit: whitespace after `:`



##########
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java:
##########
@@ -473,6 +475,17 @@ public synchronized boolean requestTargetNumberOfContainers(YarnContainerRequest
       return false;
     }
 
+    //Correct the containerMap first as there is cases that handleContainerCompletion() is called before onContainersAllocated()
+    for (ContainerId removedId :this.removedContainerID.keySet()) {
+      ContainerInfo containerInfo = this.containerMap.remove(removedId);
+      if (containerInfo != null) {
+        String helixTag = containerInfo.getHelixTag();
+        allocatedContainerCountMap.putIfAbsent(helixTag, new AtomicInteger(0));
+        this.allocatedContainerCountMap.get(helixTag).decrementAndGet();

Review Comment:
   put if absent 0 and then decrementing means the resulting value would be -1. That does not seem correct to me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org