You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2016/07/01 08:21:11 UTC

[jira] [Commented] (STORM-1940) Storm Topo is auto re-balance after ZK RECONNECTED

    [ https://issues.apache.org/jira/browse/STORM-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358639#comment-15358639 ] 

Jungtaek Lim commented on STORM-1940:
-------------------------------------

[~happylu]
Could you share other worker logs as well if you don't mind? TransactionalState.setData() already check node existence before creating node, so I suspect there's race condition on it.

Please refer here: https://github.com/apache/storm/blob/v1.0.1/storm-core/src/jvm/org/apache/storm/trident/topology/state/TransactionalState.java#L103-L121


> Storm Topo is auto re-balance after ZK RECONNECTED
> --------------------------------------------------
>
>                 Key: STORM-1940
>                 URL: https://issues.apache.org/jira/browse/STORM-1940
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 1.0.1
>            Reporter: happylu
>            Priority: Critical
>
> I have a Topo with 2 workers at 2 Vm, while ZK RECONNECTED, Storm Topo will be auto-reblance. 
> The log show NodeExists for /meta/712285. I guess it cause by: After reconnect successfully, TridentSpoutCoordinator create this node again, but this node is already created before the reconnect.
>  Can we check if node exist first? Or not throw this exception to make whole Topo re-balance. 
> {code}
> 06-29 05:54:37.515 [Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4 4]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] shade.org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid = 0x7a556eeee8c70ae1, negotiated timeout = 10000
> 06-29 05:54:37.515 [Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4 4]-EventThread] apache.curator.framework.state.ConnectionStateManager [INFO] State change: RECONNECTED
> 06-29 05:54:37.519 [Thread-133-spout-DataKafkaSpout1466801942228-executor[154 154]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid = 0x7a556eeee8c70ae5, negotiated timeout = 10000
> 06-29 05:54:37.519 [Thread-133-spout-DataKafkaSpout1466801942228-executor[154 154]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed (SyncConnected)
> 06-29 05:54:37.524 [Thread-25-spout-DataKafkaSpout1466801942228-executor[156 156]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid = 0x7a556eeee8c70ae4, negotiated timeout = 10000
> 06-29 05:54:37.524 [Thread-25-spout-DataKafkaSpout1466801942228-executor[156 156]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed (SyncConnected)
> 06-29 05:54:37.528 [main-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] shade.org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid = 0x7b556f0cc3a40896, negotiated timeout = 10000
> 06-29 05:54:37.528 [main-EventThread] apache.curator.framework.state.ConnectionStateManager [INFO] State change: RECONNECTED
> 06-29 05:54:37.528 [Thread-149-spout-DataKafkaSpout1466801942228-executor[160 160]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid = 0x7a556eeee8c70ae3, negotiated timeout = 10000
> 06-29 05:54:37.528 [Thread-149-spout-DataKafkaSpout1466801942228-executor[160 160]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed (SyncConnected)
> 06-29 05:54:37.536 [Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4 4]] org.apache.storm.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /meta/712285
> 	at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:452) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:418) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.daemon.executor$fn__7953$fn__7966$fn__8019.invoke(executor.clj:847) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.util$async_loop$fn__625.invoke(util.clj:484) [storm-core-1.0.1.jar:1.0.1]
> 	at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
> 	at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]
> Caused by: java.lang.RuntimeException: org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /meta/712285
> 	at org.apache.storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:119) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.trident.topology.state.RotatingTransactionalState.overrideState(RotatingTransactionalState.java:52) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.trident.spout.TridentSpoutCoordinator.execute(TridentSpoutCoordinator.java:71) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.daemon.executor$fn__7953$tuple_action_fn__7955.invoke(executor.clj:728) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.daemon.executor$mk_task_receiver$fn__7874.invoke(executor.clj:461) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.disruptor$clojure_handler$reify__7390.onEvent(disruptor.clj:40) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:439) ~[storm-core-1.0.1.jar:1.0.1]
> 	... 6 more
> Caused by: org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /meta/712285
> 	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:119) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:721) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:704) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:701) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:477) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:467) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.trident.topology.state.TransactionalState.forPath(TransactionalState.java:83) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.trident.topology.state.TransactionalState.createNode(TransactionalState.java:95) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:115) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.trident.topology.state.RotatingTransactionalState.overrideState(RotatingTransactionalState.java:52) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.trident.spout.TridentSpoutCoordinator.execute(TridentSpoutCoordinator.java:71) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.daemon.executor$fn__7953$tuple_action_fn__7955.invoke(executor.clj:728) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.daemon.executor$mk_task_receiver$fn__7874.invoke(executor.clj:461) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.disruptor$clojure_handler$reify__7390.onEvent(disruptor.clj:40) ~[storm-core-1.0.1.jar:1.0.1]
> 	at org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:439) ~[storm-core-1.0.1.jar:1.0.1]
> 	... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)