You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Fred Krone (JIRA)" <ji...@apache.org> on 2017/12/11 19:13:03 UTC

[jira] [Updated] (GEODE-4051) Two server jvms crashed at same time and caused some primary and redundant buckets to be cleared. Causing some buckets to get locked and not able to recover also after bouncing all servers

     [ https://issues.apache.org/jira/browse/GEODE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fred Krone updated GEODE-4051:
------------------------------
    Component/s:     (was: core)
                 regions

> Two server jvms crashed at same time and caused some primary and redundant buckets to be cleared. Causing some buckets to get locked and not able to recover also after bouncing all servers
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-4051
>                 URL: https://issues.apache.org/jira/browse/GEODE-4051
>             Project: Geode
>          Issue Type: Bug
>          Components: regions
>            Reporter: Igor Barchak
>             Fix For: 1.2.0
>
>
> "Pooled Waiting Message Processor 5" tid=0x162
>     java.lang.Thread.State: TIMED_WAITING
>         at sun.misc.Unsafe.park(Native Method)
>         -  waiting on java.util.concurrent.CountDownLatch$Sync@1993a5
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:644)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:624)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:519)
>         at org.apache.geode.internal.cache.StateFlushOperation.flush(StateFlushOperation.java:243)
>         at org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:349)
>         at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1168)
>         at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1023)
>         at org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:253)
>         at org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:962)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:726)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:414)
>         -  locked org.apache.geode.internal.cache.ProxyBucketRegion@6820a0b6
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:272)
>         at org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2815)
>         at org.apache.geode.internal.cache.partitioned.ManageBackupBucketMessage.operateOnPartitionedRegion(ManageBackupBucketMessage.java:148)
>         at org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
> Seems like it was introduced in this fix
> https://github.com/apache/geode/commit/3a1062e245b3ded52ea3f6b6de0aff94ce846fa3?diff=split
> See StateMarkerMessage.process
> The first if condition doesn't have a finally block.
> The else has a finally block.
> The first if condition didn't have a 'waitFor' operation earlier - it was introduced in this commit



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)