You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Hitesh Khamesra <hk...@pivotal.io> on 2017/05/15 20:04:16 UTC

Re: Review Request 58937: GEODE-2865 data loss in initial-image replication with multicast

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58937/#review175012
-----------------------------------------------------------


Ship it!




Ship It!

- Hitesh Khamesra


On May 5, 2017, 4:57 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58937/
> -----------------------------------------------------------
> 
> (Updated May 5, 2017, 4:57 p.m.)
> 
> 
> Review request for geode, Galen O'Sullivan, Hitesh Khamesra, and Udo Kohlmeyer.
> 
> 
> Bugs: GEODE-2865
>     https://issues.apache.org/jira/browse/GEODE-2865
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> The state-flush algorithm relies on MembershipManager.waitForMessageState() to ensure that all operations have been received and applied to the cache prior to state replication starting.  For multicast there was a flaw in the algorithm caused by two things: 1) cache operations were being sent out-of-band, allowing them to be processed out of order with the state-flush message, and 2) JGroupsMessenger was only waiting for the messages to be dispatched by NAKACK2, which isn't necessarily the same as being dispatched to the DistributionManager Executor that processes the message.
> 
> Cache operation messages are now sent in-band.
> JGroupsMessenger now tracks NAKACK2 (multicast) sequence numbers of messages dispatched to the DistributionManager and this is used in waitForMessageState() to make sure the messages have been queued.
> If multicast is enabled we now flush the serial executor to in waitForMessageState() to make sure that all messages queued in it have been applied to the region.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/messenger/JGroupsMessenger.java e99eff2be344d54da67c257a0bfa73ad8268c4c6 
>   geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/mgr/GMSMembershipManager.java 8ae66d0b6839cfbd46b479d896104f54fd11a68d 
>   geode-core/src/test/java/org/apache/geode/distributed/DistributedSystemDUnitTest.java 9a64f531431e714916765d6d6c741841ef719fb8 
>   geode-core/src/test/java/org/apache/geode/distributed/internal/membership/gms/messenger/JGroupsMessengerJUnitTest.java 307b5948c02befee61d61b628c38b8b8b8693c4d 
>   geode-core/src/test/java/org/apache/geode/internal/cache/FixedPRSinglehopDUnitTest.java 7e798c8358aaec070d3dd9d04c2486bd33a21d9e 
> 
> 
> Diff: https://reviews.apache.org/r/58937/diff/2/
> 
> 
> Testing
> -------
> 
> passes precheckin and modified unit tests
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>