You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Ryan McMahon (JIRA)" <ji...@apache.org> on 2019/04/05 18:28:00 UTC

[jira] [Created] (GEODE-6607) Possible client subscription data inconsistency due to race between retrieving filter info and distributing event

Ryan McMahon created GEODE-6607:
-----------------------------------

             Summary: Possible client subscription data inconsistency due to race between retrieving filter info and distributing event
                 Key: GEODE-6607
                 URL: https://issues.apache.org/jira/browse/GEODE-6607
             Project: Geode
          Issue Type: Bug
          Components: client queues
            Reporter: Ryan McMahon


It is possible for a client to miss events from subscription (either CQ or register interest) due to the following scenario:

Four servers in a cluster, with redundant copies set to 2 for client subscriptions.  The client has its primary subscription endpoint with server 1 and redundant copies are on servers 2 and 3.  Server 2 is killed or lost due to network partition, so we attempt to restore redundancy by copying the client queue from server 3 to server 4.  

Two things happen when server 4 gets the client queue from server 3.  First, we request the client's filter info which represents the CQ and register interest info.  Second, we actually perform the GII to get the image of the queue.  

A race can occur where an event is being distributed across the cluster concurrently while server 4 is initializing the client queue.  If the distributed event is processed by server 4 before the filter info is retrieved, then the event will not match the client subscription filter because it doesn't exist yet.  Then, if the event is processed by server 3 after GII has started, the event will not be part of the client queue image.  Therefore, the event is never added to the client queue and is lost.

We have a special queue for handling events while a client is initializing, but it is at too low of a level (MessageDispatcher) to be able to handle this scenario.  One possible solution is moving this special queue to a higher level (CacheClientNotifier or CacheClientProxy) so the event is queued before we even attempt to get filter info.  Then, when initialization finishes, we drain the queue, see if it matches the initialized client's filter, and send it along if so.  A similar solution could be done on the GII provider side but it might be a bit messier.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)