You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2020/08/13 01:31:21 UTC

[GitHub] [samza] Sanil15 opened a new pull request #1414: SAMZA-2579: Force restart feature for Container Placements

Sanil15 opened a new pull request #1414:
URL: https://github.com/apache/samza/pull/1414


   **Changes**: The current restart ability for container placements works in the following way:
   1. Tries to fetch resources on a host
   2. Stops the active container if resources are accrued
   3. Tried to start the container on host accrued
   
   In production, we have seen the following observation at Linkedin
   1. Some jobs are configured to use resources for the peak which leads to no headroom left on a host for requesting additional resources
   2. This leads to restart requests failing due to not able to get resources on that host
   
   A fix to this is to implement a force-restart utility , in this version we will stop the container first and then accrue resources. The upside being we will at least free up the resources on the host before issuing resource request, the downside being it will be a best-effort scenario to bring that container back up on that host
   
   **API Changes:** Added new param values to destinationHost param for container placement request message
   
   LAST_SEEN: Tries to restart a container on last seen host with RESERVE -> STOP -> MOVE policy
   
   FORCE_RESTART_LAST_SEEN: Tries to restart a container on last seen host with STOP -> RESERVE -> MOVE policy
   
   **Upgrade Instructions:** None
   
   **Usage Instructions:** None
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] Sanil15 commented on a change in pull request #1414: SAMZA-2579: Force restart feature for Container Placements

Posted by GitBox <gi...@apache.org>.
Sanil15 commented on a change in pull request #1414:
URL: https://github.com/apache/samza/pull/1414#discussion_r474334769



##########
File path: samza-core/src/main/java/org/apache/samza/clustermanager/ContainerManager.java
##########
@@ -54,6 +54,8 @@
 
   private static final Logger LOG = LoggerFactory.getLogger(ContainerManager.class);
   private static final String ANY_HOST = ResourceRequestState.ANY_HOST;
+  private static final String LAST_SEEN = "LAST_SEEN";
+  private static final String FORCE_RESTART_LAST_SEEN = "FORCE_RESTART_LAST_SEEN";

Review comment:
       discussed offline, its the destinationHost param which is.
   
   **All its really doing is issuing a stop and relying on the AM to bring it back up?**
   Yup that is what is it doing
   
   **In a corner case, AM will not be able to bring it back up on the same host, rather on a different available host?**
   Yes and that is documented
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] rmatharu commented on a change in pull request #1414: SAMZA-2579: Force restart feature for Container Placements

Posted by GitBox <gi...@apache.org>.
rmatharu commented on a change in pull request #1414:
URL: https://github.com/apache/samza/pull/1414#discussion_r473262481



##########
File path: samza-core/src/main/java/org/apache/samza/clustermanager/ContainerManager.java
##########
@@ -54,6 +54,8 @@
 
   private static final Logger LOG = LoggerFactory.getLogger(ContainerManager.class);
   private static final String ANY_HOST = ResourceRequestState.ANY_HOST;
+  private static final String LAST_SEEN = "LAST_SEEN";
+  private static final String FORCE_RESTART_LAST_SEEN = "FORCE_RESTART_LAST_SEEN";

Review comment:
       Isnt force-restart a misnomer here?
   All its really doing is issuing a stop and relying on the AM to bring it back up?
   
   In a corner case, AM will not be able to bring it back up on the same host, rather on a different available host?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] rmatharu commented on a change in pull request #1414: SAMZA-2579: Force restart feature for Container Placements

Posted by GitBox <gi...@apache.org>.
rmatharu commented on a change in pull request #1414:
URL: https://github.com/apache/samza/pull/1414#discussion_r473263178



##########
File path: samza-core/src/main/java/org/apache/samza/clustermanager/ContainerManager.java
##########
@@ -391,6 +402,28 @@ public void registerContainerPlacementAction(ContainerPlacementRequestMessage re
       return;
     }
 
+    /*
+     * When destination host is {@code FORCE_RESTART_LAST_SEEN} its treated as eqvivalent to kill -9 operation for the container
+     * In this scenario container is stopped first and we fallback to normal restart path
+     */
+    if (destinationHost.equals(FORCE_RESTART_LAST_SEEN)) {
+      LOG.info("Issuing a force restart for Processor ID: {} for ContainerPlacement action request {}", processorId, requestMessage);
+      clusterResourceManager.stopStreamProcessor(samzaApplicationState.runningProcessors.get(processorId));
+      writeContainerPlacementResponseMessage(requestMessage, ContainerPlacementMessage.StatusCode.SUCCEEDED,
+          "Successfully issued a stop container request falling back to normal restart path");
+      return;
+    }
+
+    /**
+     * When destination host is {@code LAST_SEEN} its treated as a restart request on the host where container is running
+     * on or has been seen last
+     */

Review comment:
       This comment has the same meaning as the one above 
   "if (destinationHost.equals(FORCE_RESTART_LAST_SEEN))" 
   
   Could we write the difference 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] mynameborat merged pull request #1414: SAMZA-2579: Force restart feature for Container Placements

Posted by GitBox <gi...@apache.org>.
mynameborat merged pull request #1414:
URL: https://github.com/apache/samza/pull/1414


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] Sanil15 commented on a change in pull request #1414: SAMZA-2579: Force restart feature for Container Placements

Posted by GitBox <gi...@apache.org>.
Sanil15 commented on a change in pull request #1414:
URL: https://github.com/apache/samza/pull/1414#discussion_r474334828



##########
File path: samza-core/src/main/java/org/apache/samza/clustermanager/ContainerManager.java
##########
@@ -391,6 +402,28 @@ public void registerContainerPlacementAction(ContainerPlacementRequestMessage re
       return;
     }
 
+    /*
+     * When destination host is {@code FORCE_RESTART_LAST_SEEN} its treated as eqvivalent to kill -9 operation for the container
+     * In this scenario container is stopped first and we fallback to normal restart path
+     */
+    if (destinationHost.equals(FORCE_RESTART_LAST_SEEN)) {
+      LOG.info("Issuing a force restart for Processor ID: {} for ContainerPlacement action request {}", processorId, requestMessage);
+      clusterResourceManager.stopStreamProcessor(samzaApplicationState.runningProcessors.get(processorId));
+      writeContainerPlacementResponseMessage(requestMessage, ContainerPlacementMessage.StatusCode.SUCCEEDED,
+          "Successfully issued a stop container request falling back to normal restart path");
+      return;
+    }
+
+    /**
+     * When destination host is {@code LAST_SEEN} its treated as a restart request on the host where container is running
+     * on or has been seen last
+     */

Review comment:
       sure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] Sanil15 commented on pull request #1414: SAMZA-2579: Force restart feature for Container Placements

Posted by GitBox <gi...@apache.org>.
Sanil15 commented on pull request #1414:
URL: https://github.com/apache/samza/pull/1414#issuecomment-677967621


   > need to call it kill-container maybe
   
   actual tool will be sometime like: `samza-tool kill-container ....`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] rmatharu commented on a change in pull request #1414: SAMZA-2579: Force restart feature for Container Placements

Posted by GitBox <gi...@apache.org>.
rmatharu commented on a change in pull request #1414:
URL: https://github.com/apache/samza/pull/1414#discussion_r475768614



##########
File path: samza-core/src/main/java/org/apache/samza/clustermanager/ContainerManager.java
##########
@@ -391,6 +402,30 @@ public void registerContainerPlacementAction(ContainerPlacementRequestMessage re
       return;
     }
 
+    /*
+     * When destination host is {@code FORCE_RESTART_LAST_SEEN} its treated as eqvivalent to kill -9 operation for the container

Review comment:
       small thing: 
   I think its equivalent to a "kill"  (i.e., sigstop), not "kill -9" (sigkill). 
   Could you verify using the yarn docs




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] Sanil15 commented on pull request #1414: SAMZA-2579: Force restart feature for Container Placements

Posted by GitBox <gi...@apache.org>.
Sanil15 commented on pull request #1414:
URL: https://github.com/apache/samza/pull/1414#issuecomment-674178360


   @rmatharu please have a look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org