You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Xiaojian Zhou (Jira)" <ji...@apache.org> on 2021/04/24 06:07:00 UTC

[jira] [Created] (GEODE-9191) PR clear should not miss clearing bucket which lost primary

Xiaojian Zhou created GEODE-9191:
------------------------------------

             Summary: PR clear should not miss clearing bucket which lost primary
                 Key: GEODE-9191
                 URL: https://issues.apache.org/jira/browse/GEODE-9191
             Project: Geode
          Issue Type: Bug
            Reporter: Xiaojian Zhou


This scenario is found when introducing GII test case for PR clear. The sequence is:

(1) there're 3 servers, server1 is accessor, server2 and server3 are datastores.
(2) shutdown server2
(3) send PR clear from server1 (accessor) and restart server2 at the same time. There's a race that server2 did not receive the PartitionedRegionClearMessage.
(4) server2 finished GII
(5) only server3 received PartitionedRegionClearMessage and it hosts all the primary buckets. When PR clear thread iterates through these primary buckets one by one, some of them might lose primary to server2. 
(6) BR.cmnClearRegion will return immediately since it's no longer primary, but clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be called. So from the caller point of view, this bucket is cleared. It wouldn't even throw PartitionedRegionPartialClearException.

The problem is:
before calling cmnClearRegion, we should call BR.doLockForPrimary to make sure it's still primary. If not, throw exception.  Then clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for this bucket. 
The expected behavior in this scenario is to throw PartitionedRegionPartialClearException.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)