You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2016/07/01 14:25:11 UTC

[jira] [Created] (STORM-1941) Nimbus discovery can fail when zookeeper reconnect happens.

Jungtaek Lim created STORM-1941:
-----------------------------------

             Summary: Nimbus discovery can fail when zookeeper reconnect happens.
                 Key: STORM-1941
                 URL: https://issues.apache.org/jira/browse/STORM-1941
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
    Affects Versions: 1.0.0, 1.0.1
            Reporter: Jungtaek Lim
            Assignee: Jungtaek Lim


When zookeeper reconnect happens, nimbus registry can be deleted though nimbus is alive.

Below is zookeeper node for nimbus registry.

{code}
get /storm/nimbuses/<host>:6627
?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
?'h?g?g?g?g
t-?,[??Q
cZxid = 0x4000005ae
ctime = Fri Jul 01 11:43:51 UTC 2016
mZxid = 0x4000005ae
mtime = Fri Jul 01 11:43:51 UTC 2016
pZxid = 0x4000005ae
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x255a62e310c0005
dataLength = 98
numChildren = 0
{code}

{code}
get /storm/nimbuses/<host>:6627
?f`d``??????M?-?-.?/??5??/H?+.IL???ON??``b`?|???^^???????
?'h?g?g?g?g
t-?,[??Q
cZxid = 0x4000005ae
ctime = Fri Jul 01 11:43:51 UTC 2016
mZxid = 0x50000000e
mtime = Fri Jul 01 11:46:08 UTC 2016
pZxid = 0x4000005ae
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x255a62e310c0005
dataLength = 98
numChildren = 0
{code}

Below is transaction log for that node.
{code}
7/1/16 11:43:51 AM UTC session 0x255a62e310c0005 cxid 0xd zxid 0x4000005ae create '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,v{s{31,s{'world,'anyone}}},T,10

7/1/16 11:46:08 AM UTC session 0x355a647bd8c0000 cxid 0x3 zxid 0x50000000e setData '/storm/nimbuses/<host>:6627,#1fffffff8b80000000ffffffe36660646060ffffff90ffffffcfffffffcaffffffc9ffffffccffffffd54dffffffcc2dffffffd62d2effffffc92fffffffcaffffffd535ffffffd2ffffffcb2f48ffffffcd2b2e494cffffffceffffffceffffffc94f4effffffccffffffe160606260ffffff907cffffffccffffffc1ffffffc01c5e165effffffceffffffc4ffffffc0ffffffc2ffffffc0ffffffcdffffffc0affffffd42768ffffffa867ffffffa067ffffffa867ffffffa467affffffa4d742dffffff8c2c1805b14ffffffc2ffffffaf51000,1
{code}

Please take a look at ctime, mtime, and ephemeralOwner.
Ephemeral owner session was already closed from nimbus side but there's possible for node to be not deleted immediately, so new session doesn't create new node but set the value to ephemeral node for other session which is already closed.

{code}
2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ClientCnxn [DEBUG] Disconnecting client for session: 0x255a62e310c0005
2016-07-01 11:45:05.675 o.a.s.s.o.a.z.ZooKeeper [INFO] Session: 0x255a62e310c0005 closed
{code}

We can delete the node first and set ephemeral node when reconnect event handler is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)