You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "Santiago Perez (JIRA)" <ji...@apache.org> on 2013/01/25 22:43:13 UTC

[jira] [Created] (HELIX-29) Not receiving transitions after participant reconnection

Santiago Perez created HELIX-29:
-----------------------------------

             Summary: Not receiving transitions after participant reconnection
                 Key: HELIX-29
                 URL: https://issues.apache.org/jira/browse/HELIX-29
             Project: Apache Helix
          Issue Type: Bug
            Reporter: Santiago Perez


We have nodes that due to long GC pauses have their ZK connections expire. We're handling the expiration and disconnecting the participant and reconnecting it aferwards. Usually this means the state gets reset to IDLE and we get the proper transitions to the ideal state (in this case ONLINE).

However, sometimes we don't get any transitions at all although the disconnection and reconnection are successful. One interesting side effect is that the IDEALSTATE for that node remains ONLINE, the EXTERNALVIEW remains ONLINE, yet the CURRENTSTATE shows IDLE, and no transitions are sent back to the participant.

Here are the ZK contents for one of this nodes:

[zk: localhost:2122(CONNECTED) 41] get /<NAMESPACE>/<CLUSTER>/INSTANCES/<PARTICIPANT-NAME>/CURRENTSTATES/338bfded5e60877/<RESOURCE-NAME>
{
  "id":"<RESOURCE-NAME>"
  ,"simpleFields":{
    "BUCKET_SIZE":"0"
    ,"SESSION_ID":"338bfded5e60877"
    ,"STATE_MODEL_DEF":"Bootstrap"
    ,"STATE_MODEL_FACTORY_NAME":"<FACTORY-NAME>"
  }
  ,"listFields":{
  }
  ,"mapFields":{
    "<RESOURCE-NAME>_17":{
      "CURRENT_STATE":"IDLE"
    }
  }
}
cZxid = 0x2010d26c8
ctime = Sun Jan 20 03:14:57 PST 2013
mZxid = 0x2010d26f5
mtime = Sun Jan 20 03:14:58 PST 2013
pZxid = 0x2010d26c8
cversion = 0
dataVersion = 2
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 281
numChildren = 0


[zk: localhost:2122(CONNECTED) 42] get /<NAMESPACE>/<CLUSTER>/EXTERNALVIEW/<RESOURCE-NAME>                                 
{
  "id" : "<RESOURCE-NAME>",
  "simpleFields" : {
    "BUCKET_SIZE" : "0"
  },
  "mapFields" : {
     
    ... PREVIOUS PARTITIONS ...

    "<RESOURCE-NAME>_17" : {
      "<PARTICIPANT>" : "ONLINE"
    },

    ... FOLLOWING PARTITIONS ...

  },
  "listFields" : {
  }
}
cZxid = 0x200595a78
ctime = Thu Nov 08 18:06:23 PST 2012
mZxid = 0x201077ec6
mtime = Fri Jan 18 16:40:03 PST 2013
pZxid = 0x200595a78
cversion = 0
dataVersion = 4666
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 3367
numChildren = 0

The ideal state is very similar to the EXTERNALVIEW, if you want I can post that too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira