You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "kishore gopalakrishna (JIRA)" <ji...@apache.org> on 2013/01/26 01:19:13 UTC

[jira] [Assigned] (HELIX-29) Not receiving transitions after participant reconnection

     [ https://issues.apache.org/jira/browse/HELIX-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

kishore gopalakrishna reassigned HELIX-29:
------------------------------------------

    Assignee: dafu
    
> Not receiving transitions after participant reconnection
> --------------------------------------------------------
>
>                 Key: HELIX-29
>                 URL: https://issues.apache.org/jira/browse/HELIX-29
>             Project: Apache Helix
>          Issue Type: Bug
>            Reporter: Santiago Perez
>            Assignee: dafu
>
> We have nodes that due to long GC pauses have their ZK connections expire. We're handling the expiration and disconnecting the participant and reconnecting it aferwards. Usually this means the state gets reset to IDLE and we get the proper transitions to the ideal state (in this case ONLINE).
> However, sometimes we don't get any transitions at all although the disconnection and reconnection are successful. One interesting side effect is that the IDEALSTATE for that node remains ONLINE, the EXTERNALVIEW remains ONLINE, yet the CURRENTSTATE shows IDLE, and no transitions are sent back to the participant.
> Here are the ZK contents for one of this nodes:
> [zk: localhost:2122(CONNECTED) 41] get /<NAMESPACE>/<CLUSTER>/INSTANCES/<PARTICIPANT-NAME>/CURRENTSTATES/338bfded5e60877/<RESOURCE-NAME>
> {
>   "id":"<RESOURCE-NAME>"
>   ,"simpleFields":{
>     "BUCKET_SIZE":"0"
>     ,"SESSION_ID":"338bfded5e60877"
>     ,"STATE_MODEL_DEF":"Bootstrap"
>     ,"STATE_MODEL_FACTORY_NAME":"<FACTORY-NAME>"
>   }
>   ,"listFields":{
>   }
>   ,"mapFields":{
>     "<RESOURCE-NAME>_17":{
>       "CURRENT_STATE":"IDLE"
>     }
>   }
> }
> cZxid = 0x2010d26c8
> ctime = Sun Jan 20 03:14:57 PST 2013
> mZxid = 0x2010d26f5
> mtime = Sun Jan 20 03:14:58 PST 2013
> pZxid = 0x2010d26c8
> cversion = 0
> dataVersion = 2
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 281
> numChildren = 0
> [zk: localhost:2122(CONNECTED) 42] get /<NAMESPACE>/<CLUSTER>/EXTERNALVIEW/<RESOURCE-NAME>                                 
> {
>   "id" : "<RESOURCE-NAME>",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0"
>   },
>   "mapFields" : {
>      
>     ... PREVIOUS PARTITIONS ...
>     "<RESOURCE-NAME>_17" : {
>       "<PARTICIPANT>" : "ONLINE"
>     },
>     ... FOLLOWING PARTITIONS ...
>   },
>   "listFields" : {
>   }
> }
> cZxid = 0x200595a78
> ctime = Thu Nov 08 18:06:23 PST 2012
> mZxid = 0x201077ec6
> mtime = Fri Jan 18 16:40:03 PST 2013
> pZxid = 0x200595a78
> cversion = 0
> dataVersion = 4666
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 3367
> numChildren = 0
> The ideal state is very similar to the EXTERNALVIEW, if you want I can post that too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira