You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Swapnil Bawaskar (JIRA)" <ji...@apache.org> on 2018/02/01 22:52:53 UTC
[jira] [Closed] (GEODE-4051) Two server jvms crashed at same time and caused some primary and redundant buckets to be cleared. Causing some buckets to get locked and not able to recover also after bouncing all servers

     [ https://issues.apache.org/jira/browse/GEODE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swapnil Bawaskar closed GEODE-4051.
-----------------------------------

> Two server jvms crashed at same time and caused some primary and redundant buckets to be cleared. Causing some buckets to get locked and not able to recover also after bouncing all servers
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-4051
>                 URL: https://issues.apache.org/jira/browse/GEODE-4051
>             Project: Geode
>          Issue Type: Bug
>          Components: regions
>    Affects Versions: 1.2.0
>            Reporter: Igor Barchak
>            Assignee: Darrel Schneider
>            Priority: Major
>             Fix For: 1.4.0
>
>
> "Pooled Waiting Message Processor 5" tid=0x162
>     java.lang.Thread.State: TIMED_WAITING
>         at sun.misc.Unsafe.park(Native Method)
>         -  waiting on java.util.concurrent.CountDownLatch$Sync@1993a5
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:644)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:624)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:519)
>         at org.apache.geode.internal.cache.StateFlushOperation.flush(StateFlushOperation.java:243)
>         at org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:349)
>         at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1168)
>         at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1023)
>         at org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:253)
>         at org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:962)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:726)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:414)
>         -  locked org.apache.geode.internal.cache.ProxyBucketRegion@6820a0b6
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:272)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2815)
>         at org.apache.geode.internal.cache.partitioned.ManageBackupBucketMessage.operateOnPartitionedRegion(ManageBackupBucketMessage.java:148)
>         at org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
> Seems like it was introduced in this fix
> https://github.com/apache/geode/commit/3a1062e245b3ded52ea3f6b6de0aff94ce846fa3?diff=split
> See StateMarkerMessage.process
> The first if condition doesn't have a finally block.
> The else has a finally block.
> The first if condition didn't have a 'waitFor' operation earlier - it was introduced in this commit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)