You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "P. Taylor Goetz (JIRA)" <ji...@apache.org> on 2017/04/13 20:17:41 UTC
[jira] [Resolved] (STORM-1114) Racing condition in trident
zookeeper zk-node create/delete
[ https://issues.apache.org/jira/browse/STORM-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
P. Taylor Goetz resolved STORM-1114.
------------------------------------
Resolution: Fixed
Fix Version/s: 1.1.1
1.0.4
0.10.3
2.0.0
Merged to master, 1.x-branch, 1.0.x-branch, and 0.10.x-branch.
> Racing condition in trident zookeeper zk-node create/delete
> -----------------------------------------------------------
>
> Key: STORM-1114
> URL: https://issues.apache.org/jira/browse/STORM-1114
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Reporter: Zhuo Liu
> Assignee: P. Taylor Goetz
> Fix For: 2.0.0, 0.10.3, 1.0.4, 1.1.1
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> In production for some trident topology, we met the bug that some workers are trying to create a zk-node that is already existent or delete a zk node that has already been deleted. This causes the worker process to die.
>
> We dissect the problem and figure out that there exists racing condition in trident TransactionalState's zk-node create and delete codes.
> failure stack trace in worker.log:
> {noformat}
> Caused by: org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /ignoreStoredMetadata
> at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:119) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:676) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:660) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:656) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:441) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:431) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:239) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:193) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at storm.trident.topology.state.TransactionalState.forPath(TransactionalState.java:83) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at storm.trident.topology.state.TransactionalState.createNode(TransactionalState.java:100) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:115) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> ... 9 more
> 2015-10-14 18:10:43.786 b.s.util [ERROR] Halting process: ("Worker died")
> {noformat}
> {noformat}
> Caused by: org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /rainbowHdfsPath
> at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:239) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:234) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:215) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at org.apache.storm.shade.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:42) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at storm.trident.topology.state.TransactionalState.delete(TransactionalState.java:126) ~[storm-core-0.10.1.y.jar:0.10.1.y]
> ... 12 more
> 2015-10-14 18:10:28.799 b.s.util [ERROR] Halting process: ("Worker died")
> java.lang.RuntimeException: ("Worker died")
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)