You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Jordan Zimmerman (JIRA)" <ji...@apache.org> on 2013/03/26 23:31:16 UTC

[jira] [Created] (CURATOR-4) POST_INITIALIZED_EVENT race conditions / optimizations

Jordan Zimmerman created CURATOR-4:
--------------------------------------

             Summary: POST_INITIALIZED_EVENT race conditions / optimizations
                 Key: CURATOR-4
                 URL: https://issues.apache.org/jira/browse/CURATOR-4
             Project: Apache Curator
          Issue Type: Bug
          Components: Recipes
    Affects Versions: 2.0.0
            Reporter: Jordan Zimmerman
            Assignee: Jordan Zimmerman


>From https://github.com/Netflix/curator/pull/261

We've been running a data structure modeled after PathChildrenCache for a while now on a path with ~2500 child nodes and using a async loading strategy very similar to the new POST_INITIALIZED_EVENT startup mode. I noticed a couple subtle race conditions that we've encountered in our own code - thought I'd share them back.

trun@5229c8c If a node is removed during startup (after getChildren() but before getDataAndStat()) the INITIALIZED_EVENT will never fire. Handling NONODE events fixes this.
trun@fb94530 Though highly unlikely, I think it's possible for the initialSet to appear to be fully initialized before all the getDataAndStat() calls have even been issued? Constructing the initialSet before issuing any getDataAndStat() calls eliminates this possibility.
trun@e4ddc6c Each call to maybeOfferInitializedEvent() loops over the entire initialSet, and since it's called after each updateInitialSet() this can get pretty expensive ( O(n2) ) with thousands of children. There doesn't seem to be much value in keeping the entire initialSet around so removing each node after it's loaded simplifies this check a great deal.
Also one simple bugfix (which may not even be necessary any more)...

trun@3385adb initialSet is keyed on node, not fullPath so this call was just a NOP before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira