You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "kishore gopalakrishna (JIRA)" <ji...@apache.org> on 2016/01/09 08:02:39 UTC

[jira] [Commented] (HELIX-621) Missing listener notification of LiveInstances changes (and possibly other state change)

    [ https://issues.apache.org/jira/browse/HELIX-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090470#comment-15090470 ] 

kishore gopalakrishna commented on HELIX-621:
---------------------------------------------

There is a subtle difference in the sequence.

We read after setting the watch back. What this means is that even though we dont get additional notification of L2 going down, when we read liveinstances, we won't see L2. In other words, the sequence is as follows

1) Set watch W on some path P
2) Event E1 modifies P triggering W
3) The callback for W re-sets W on P
4) Read children of P.

For the scenario you described,

1) L1 disconnects
2) S's watch on LIVEINSTANCES fires
{L2}
3) L2 disconnects
5) S sets watch again on LIVEINSTANCES
4) S reads the children of LIVEINSTANCES:

So S will not see L2.

Hope this helps.


> Missing listener notification of LiveInstances changes (and possibly other state change)
> ----------------------------------------------------------------------------------------
>
>                 Key: HELIX-621
>                 URL: https://issues.apache.org/jira/browse/HELIX-621
>             Project: Apache Helix
>          Issue Type: Bug
>          Components: helix-core
>    Affects Versions: 0.6.5
>            Reporter: Marco P.
>
> I noticed sometimes my LiveInstanceChangeListener was not notified of an instance disconnecting.
> Digging a little bit I found out:
>  - A reliable way to consistently reproduce this problem
>  - The problem does not seem to be limited to LiveInstances, it can happen to other listeners using the same strategy
> This is bad as an application relies on notifications, and its view of the system (LiveInstances or else) can get very outdated.
> The problem at the core is this logic:
> 1) Set watch W on some path P
> 2) Event E1 modifies P triggering W
> 3) The callback for W re-sets W on P
> If however a second Event E2 modifies between 2 and 3, W will not trigger (until P is modified again).
> An example of why this is bad:
>  - 2 live instances L1, L2 and a spectator S watching them.
> 1) L1 disconnects
> 2) S's watch on LIVEINSTANCES fires
> 3) S reads the children of LIVEINSTANCES: {L2}
> 3) L2 disconnects
> 4) S's notifies LiveInstanceChangeListeners and goes back to watching LIVEINSTANCES
> The application receives a notification that the live instances now consist of {L2}. 
> And no further notification until another instance joins.
> The reality is that no instances are live.
> Again, this is not limited to LIVEINSTANCES, although that's the one I can reliably reproduce.
> Fixing this is not trivial, it requires firing the watch again when re-setting it IF the version of the watched node change since the last time the watch fired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)