You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by GitBox <gi...@apache.org> on 2020/11/09 16:28:23 UTC

[GitHub] [storm] RuiLi8080 opened a new pull request #3349: [STORM-3713] Fix the NPE and potential memory leak for the race-condition when making assignment

RuiLi8080 opened a new pull request #3349:
URL: https://github.com/apache/storm/pull/3349


   ## What is the purpose of the change
   
   We observed an NPE issue when making assignments in our cluster. Further we found that this is a race-condition between assignment sync-up and killing topology. This is to help prevent the NPE and memory leak by cleaning up again while the race-condition was noticed.
   Details: https://issues.apache.org/jira/browse/STORM-3713
   
   ## How was the change tested
   Rebuild storm locally.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [storm] RuiLi8080 closed pull request #3349: [STORM-3713] Fix the NPE and potential memory leak for the race-condition when making assignment

Posted by GitBox <gi...@apache.org>.
RuiLi8080 closed pull request #3349:
URL: https://github.com/apache/storm/pull/3349


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [storm] kishorvpatil commented on a change in pull request #3349: [STORM-3713] Fix the NPE and potential memory leak for the race-condition when making assignment

Posted by GitBox <gi...@apache.org>.
kishorvpatil commented on a change in pull request #3349:
URL: https://github.com/apache/storm/pull/3349#discussion_r520088529



##########
File path: storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java
##########
@@ -2440,6 +2440,15 @@ private void mkAssignments(String scratchTopoId) throws Exception {
             List<String> assignedTopologyIds = state.assignments(null);
             Map<String, Assignment> existingAssignments = new HashMap<>();
             for (String id : assignedTopologyIds) {
+                // A rare race condition between IStormClusterState#syncRemoteAssignments and REMOVE_TRANSITION for killing topo.
+                // For example, if a topo were killed when syncRemoteAssignments is happening.
+                // Local assignment-backend might still has the killed topo while the bases does not.
+                // Scheduling should consider the zookeeper as source of truth.
+                // So we clean up again for the set as (in-memory backend - remote state).
+                if (!bases.containsKey(id)) {
+                    state.removeStorm(id);
+                    continue;
+                }
                 //for the topology which wants rebalance (specified by the scratchTopoId)

Review comment:
       I am not sure it resolves the issue. The _IStormClusterState#syncRemoteAssignments_ can still notice discrepancy happen past this stage.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org