You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@geode.apache.org by "Igor Barchak (JIRA)" <ji...@apache.org> on 2017/12/05 16:37:00 UTC

[jira] [Created] (GEODE-4051) Two server jvms crashed at same time and caused some primary and redundant buckets to be cleared. Causing some buckets to get locked and not able to recover also after bouncing all servers

Igor Barchak created GEODE-4051:
-----------------------------------

             Summary: Two server jvms crashed at same time and caused some primary and redundant buckets to be cleared. Causing some buckets to get locked and not able to recover also after bouncing all servers
                 Key: GEODE-4051
                 URL: https://issues.apache.org/jira/browse/GEODE-4051
             Project: Geode
          Issue Type: Bug
          Components: core
            Reporter: Igor Barchak
             Fix For: 1.2.0


"Pooled Waiting Message Processor 5" tid=0x162
    java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        -  waiting on java.util.concurrent.CountDownLatch$Sync@1993a5
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
        at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
        at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:644)
        at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:624)
        at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:519)
        at org.apache.geode.internal.cache.StateFlushOperation.flush(StateFlushOperation.java:243)
        at org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:349)
        at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1168)
        at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1023)
        at org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:253)
        at org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:962)
        at org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:726)
        at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:414)
        -  locked org.apache.geode.internal.cache.ProxyBucketRegion@6820a0b6
        at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:272)
        at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2815)
        at org.apache.geode.internal.cache.partitioned.ManageBackupBucketMessage.operateOnPartitionedRegion(ManageBackupBucketMessage.java:148)
        at org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)





Seems like it was introduced in this fix

https://github.com/apache/geode/commit/3a1062e245b3ded52ea3f6b6de0aff94ce846fa3?diff=split

See StateMarkerMessage.process

The first if condition doesn't have a finally block.
The else has a finally block.

The first if condition didn't have a 'waitFor' operation earlier - it was introduced in this commit




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)