You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by GitBox <gi...@apache.org> on 2020/06/26 16:26:21 UTC

[GitHub] [storm] kishorvpatil opened a new pull request #3297: STORM-3662: Use Pacemaker if cluster has set up uses it

kishorvpatil opened a new pull request #3297:
URL: https://github.com/apache/storm/pull/3297


   ## What is the purpose of the change
   
   Currently, the storm version of the topology is used to determine the RPC heartbeats usage.  For  large clusters with beefier machines, each supervisor can have 100s of workers and multiple supervisor daemons going down can cause a lot of load on nimbus. 
   Currently, 
   * While using 2.x topologies, the RPC heartbeats ignore Pacemaker availability.
   * The call _sendSupervisorWorkerHeartbeat_ is just checking supervisor is up.
   * The worker should kill itself if assignment has changed. ( regression)
   * Supervisor timer threads are not named.
   * Nimbus should check if using Pacemaker and expecting heartbeat calls.
   
   With this change, if Pacemaker is used, the behavior is :
   
   1.  Worker does not call supervisor
   2.  Worker sends heartbeat to pacemaker periodically
   3.  Supervisor does not send worker heartbeats to nimbus.
   4. Nimbus checks if heartbeats should be expected from RPC calls or not.
   5. If supervisor is down, the worker kills itself on reassignment. So worker does hang around without checking the reassignments.
   6. Worker should restart itself if its assignments have changed. ( typically supervisor should notice the change in assignment and  restart worker.) But if supervisor is down, then this is a good backup.
   
   
   ## How was the change tested
   
   Setup cluster with Pacemaker and validate that:
   1.  Worker does sends heartbeat to Pacemaker instead of calling __sendSupervisorWorkerHeartbeat_.
   2. Stop Supervisor, re-balance topology- and worker dies (as assignments have changed - logs message about change in assignment worker.log)
   3. Supervisor does not send executor heartbeats to  nimbus.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [storm] kishorvpatil commented on a change in pull request #3297: STORM-3662: Use Pacemaker if cluster has set up uses it

Posted by GitBox <gi...@apache.org>.
kishorvpatil commented on a change in pull request #3297:
URL: https://github.com/apache/storm/pull/3297#discussion_r447163184



##########
File path: storm-client/src/jvm/org/apache/storm/cluster/IStormClusterState.java
##########
@@ -74,6 +74,13 @@
      */
     boolean isAssignmentsBackendSynchronized();
 
+    /**
+     * Flag to indicate if the Pacameker is backup store.

Review comment:
       Addressed

##########
File path: storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java
##########
@@ -361,7 +364,11 @@ public void doHeartBeat() throws IOException {
         state.setWorkerHeartBeat(lsWorkerHeartbeat);
         state.cleanup(60); // this is just in case supervisor is down so that disk doesn't fill up.
         // it shouldn't take supervisor 120 seconds between listing dir and reading it
-        heartbeatToMasterIfLocalbeatFail(lsWorkerHeartbeat);
+
+        if (!workerState.stormClusterState.isPacemakerStateStore()) {
+            LOG.debug("If pacemaker is not used, send supervisor");

Review comment:
       Addressed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [storm] Ethanlm commented on a change in pull request #3297: STORM-3662: Use Pacemaker if cluster has set up uses it

Posted by GitBox <gi...@apache.org>.
Ethanlm commented on a change in pull request #3297:
URL: https://github.com/apache/storm/pull/3297#discussion_r447054327



##########
File path: storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java
##########
@@ -215,8 +215,11 @@ private Object loadWorker(IStateStorage stateStorage, IStormClusterState stormCl
                 }
             });
 
+        Integer execHeartBeatFreqSecs = workerState.stormClusterState.isPacemakerStateStore()
+            ? (Integer) conf.get(Config.TASK_HEARTBEAT_FREQUENCY_SECS)

Review comment:
       We might want to update the comment here https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/Config.java#L1678-L1684
   since this config is no longer deprecated. and it is used when Pacemaker is in use.

##########
File path: storm-client/src/jvm/org/apache/storm/cluster/IStormClusterState.java
##########
@@ -74,6 +74,13 @@
      */
     boolean isAssignmentsBackendSynchronized();
 
+    /**
+     * Flag to indicate if the Pacameker is backup store.

Review comment:
       Do you mean `backend store`? 

##########
File path: storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java
##########
@@ -361,7 +364,11 @@ public void doHeartBeat() throws IOException {
         state.setWorkerHeartBeat(lsWorkerHeartbeat);
         state.cleanup(60); // this is just in case supervisor is down so that disk doesn't fill up.
         // it shouldn't take supervisor 120 seconds between listing dir and reading it
-        heartbeatToMasterIfLocalbeatFail(lsWorkerHeartbeat);
+
+        if (!workerState.stormClusterState.isPacemakerStateStore()) {
+            LOG.debug("If pacemaker is not used, send supervisor");

Review comment:
       We should delete `If`. Since if this message is logged, it means pacemaker is not being used.

##########
File path: storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java
##########
@@ -377,15 +378,31 @@ public StormTimer getUserTimer() {
     public SmartThread makeTransferThread() {
         return workerTransfer.makeTransferThread();
     }
-    
+
+    public void suicideIfLocalAssignmentsChanged(Assignment assignment) {
+        if (assignment != null) {
+            Set<List<Long>> assignedExecutors = new HashSet<>(readWorkerExecutors(assignmentId, port, assignment));
+            if (!localExecutors.equals(assignedExecutors)) {
+                LOG.info("Found conflicting assignments. We shouldn't be alive!"
+                         + " Assigned: " + assignedExecutors + ", Current: "
+                         + localExecutors);
+                if (!ConfigUtils.isLocalMode(conf)) {
+                    suicideCallback.run();
+                } else {
+                    LOG.info("Local worker tried to commit suicide!");
+                }
+            }
+        }
+    }
+
     public void refreshConnections() {
         Assignment assignment = null;
         try {
             assignment = getLocalAssignment(stormClusterState, topologyId);
         } catch (Exception e) {
             LOG.warn("Failed to read assignment. This should only happen when topology is shutting down.", e);
         }
-
+        suicideIfLocalAssignmentsChanged(assignment);

Review comment:
       We can remove this from this PR since it is being addressed in your other PR https://github.com/apache/storm/pull/3291




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [storm] Ethanlm merged pull request #3297: STORM-3662: Use Pacemaker if cluster has set up uses it

Posted by GitBox <gi...@apache.org>.
Ethanlm merged pull request #3297:
URL: https://github.com/apache/storm/pull/3297


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [storm] kishorvpatil commented on a change in pull request #3297: STORM-3662: Use Pacemaker if cluster has set up uses it

Posted by GitBox <gi...@apache.org>.
kishorvpatil commented on a change in pull request #3297:
URL: https://github.com/apache/storm/pull/3297#discussion_r447163341



##########
File path: storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java
##########
@@ -215,8 +215,11 @@ private Object loadWorker(IStateStorage stateStorage, IStormClusterState stormCl
                 }
             });
 
+        Integer execHeartBeatFreqSecs = workerState.stormClusterState.isPacemakerStateStore()
+            ? (Integer) conf.get(Config.TASK_HEARTBEAT_FREQUENCY_SECS)

Review comment:
       Modified Config.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org