You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by GitBox <gi...@apache.org> on 2020/11/17 06:39:04 UTC

[GitHub] [storm] RuiLi8080 opened a new pull request #3353: [STORM-3713] fix race-condition by applying submitLock to leaderCallBack

RuiLi8080 opened a new pull request #3353:
URL: https://github.com/apache/storm/pull/3353


   ## What is the purpose of the change
   
   Adding submitLock to leaderCallBack to avoid race-condition.
   
   ## How was the change tested
   
   First, we reproduce the NPE exception by adding 60s sleep right before this step. https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L222
   
   When the sleep starts, we restart zookeeper to trigger leader-re-election and kill the test topo.
   
   This lock can prevent the race-condition even with the 60s sleep. Look at the 60s gap on timestamp.
   Nimbus log:
   ```
   2020-11-17 06:24:25.114 o.a.s.c.StormClusterStateImpl main-EventThread [INFO] syncRemoteAssignments sleeps for 60s
   2020-11-17 06:24:36.126 o.a.s.d.n.Nimbus pool-34-thread-28 [INFO] TRANSITION: wc-1-1605594107 KILL null true
   ... 60s sleep ...
   2020-11-17 06:25:26.704 o.a.s.d.n.Nimbus timer [INFO] TRANSITION: wc-1-1605594107 GAIN_LEADERSHIP null false
   2020-11-17 06:25:26.742 o.a.s.d.n.Nimbus timer [INFO] Delaying event REMOVE for 30 secs for wc-1-1605594107
   2020-11-17 06:25:55.149 o.a.s.d.n.Nimbus timer [INFO] TRANSITION: wc-1-1605594107 REMOVE null false
   2020-11-17 06:25:55.154 o.a.s.d.n.Nimbus timer [INFO] Killing topology: wc-1-1605594107
   ```
   
   Client console log:
   ```
   -bash-4.2$ storm kill wc
   Running: /home/y/share/yjava_jdk/java/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/home/y/lib64/storm/2.3.0.y -Dstorm.log.dir=/home/y/lib64/storm/2.3.0.y/logs -Djava.library.path=/home/y/lib64:/usr/local/lib64:/usr/lib64:/lib64: -Dstorm.conf.file= -cp /home/y/lib64/storm/2.3.0.y/*:/home/y/lib64/storm/2.3.0.y/lib/*:/home/y/lib64/storm/2.3.0.y/extlib/*:/home/y/lib64/storm/2.3.0.y/extlib-daemon/*:/home/y/lib64/storm/current/conf:/home/y/lib64/storm/2.3.0.y/bin org.apache.storm.command.KillTopology wc
   06:24:35.567 [main] INFO  o.a.s.v.ConfigValidation - Will use [class org.apache.storm.DaemonConfig, class org.apache.storm.Config] for validation
   06:24:35.715 [main] WARN  o.a.s.v.ConfigValidation - Field public static final java.lang.String org.apache.storm.DaemonConfig.STORM_RESOURCE_ISOLATION_PLUGIN does not have validator annotation
   06:24:35.726 [main] WARN  o.a.s.v.ConfigValidation - topology.backpressure.enable is a deprecated config please see class org.apache.storm.Config.TOPOLOGY_BACKPRESSURE_ENABLE for more information.
   06:24:35.868 [main] INFO  o.a.s.m.n.Login - Successfully logged in to context StormClient using /etc/grid-keytabs/jaas.conf
   06:24:35.871 [Refresh-TGT] INFO  o.a.s.m.n.Login - TGT refresh thread started.
   06:24:35.897 [Refresh-TGT] INFO  o.a.s.m.n.Login - TGT valid starting at:        Tue Nov 17 05:56:26 UTC 2020
   06:24:35.897 [Refresh-TGT] INFO  o.a.s.m.n.Login - TGT expires:                  Wed Nov 18 05:56:26 UTC 2020
   06:24:35.898 [Refresh-TGT] INFO  o.a.s.m.n.Login - TGT refresh sleeping until: Wed Nov 18 02:13:43 UTC 2020
   06:24:36.077 [main] INFO  o.a.s.u.NimbusClient - Found leader nimbus : openstorm3blue-n4.blue.ygrid.yahoo.com:50560
   ... 60s sleep ...
   06:25:25.181 [main] INFO  o.a.s.c.KillTopology - Killed topology: wc
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org