You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/04/28 17:46:00 UTC

[jira] [Commented] (GEODE-9141) Hang while shutting down a cache server due to corrupted message

    [ https://issues.apache.org/jira/browse/GEODE-9141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334899#comment-17334899 ] 

ASF subversion and git services commented on GEODE-9141:
--------------------------------------------------------

Commit 38a3540583a1d0a402b026ee0d33ae4b0a2907d3 in geode's branch refs/heads/develop from Bill Burcham
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=38a3540 ]

GEODE-9141: (1 of 2) rename ByteBufferSharingImpl to ByteBuferVendor


> Hang while shutting down a cache server due to corrupted message
> ----------------------------------------------------------------
>
>                 Key: GEODE-9141
>                 URL: https://issues.apache.org/jira/browse/GEODE-9141
>             Project: Geode
>          Issue Type: Bug
>          Components: membership, messaging
>    Affects Versions: 1.13.2, 1.14.0, 1.15.0
>            Reporter: Bruce J Schuchardt
>            Assignee: Bill Burcham
>            Priority: Major
>              Labels: blocks-1.14.0​, blocks-1.15.0​, pull-request-available
>             Fix For: 1.15.0
>
>
> We have a test that fails once in 5000 runs with a corrupted DestroyRegionMessage.  It is always during CacheServer teardown when destroying a HARegionQueue Region.
> {noformat}
> "vm_0_thr_0_bridge_1_1_host1_6920" #144 daemon prio=5 os_prio=0 tid=0x00007fec70058800 nid=0x1d28 waiting on condition [0x00007fec62063000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000000f4f654f8> (a java.util.concurrent.CountDownLatch$Sync)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> 	at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
> 	at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:723)
> 	at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:794)
> 	at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:771)
> 	at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:857)
> 	at org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:779)
> 	at org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:676)
> 	at org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
> 	at org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
> 	at org.apache.geode.internal.cache.DistributedRegion.distributeDestroyRegion(DistributedRegion.java:1865)
> 	at org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1844)
> 	at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6180)
> 	at org.apache.geode.internal.cache.HARegion.destroyRegion(HARegion.java:331)
> 	at org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:476)
> 	at org.apache.geode.internal.cache.ha.HARegionQueue.destroy(HARegionQueue.java:3438)
> 	at org.apache.geode.internal.cache.ha.HARegionQueue$BlockingHARegionQueue.destroy(HARegionQueue.java:2272)
> 	at org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.destroyRQ(CacheClientProxy.java:1031)
> 	at org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.terminateDispatching(CacheClientProxy.java:939)
> 	at org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.shutdown(CacheClientNotifier.java:1306)
> 	- locked <0x00000000f8022800> (a org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier)
> 	at org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.close(AcceptorImpl.java:1630)
> 	- locked <0x00000000f5f7b888> (a java.lang.Object)
> 	at org.apache.geode.internal.cache.CacheServerImpl.stop(CacheServerImpl.java:491)
> 	- locked <0x00000000f7ef2980> (a org.apache.geode.internal.cache.CacheServerImpl)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.stopServers(GemFireCacheImpl.java:2672)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.doClose(GemFireCacheImpl.java:2263)
> 	- locked <0x00000000f5a21a08> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
> 	at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2151)
> 	at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1559)
> 	- locked <0x00000000f5a21a08> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
> 	at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1257)
> 	at hydra.RemoteTestModule$2.run(RemoteTestModule.java:388)
> {noformat}
> Another server logs this corrupted message.  It is almost always the same corruption.  When it's not we see the message header messed up, not a bad DSFID.
> {noformat}
> [fatal 2021/03/06 09:45:02.796 PST bridgegemfire_1_3_host1_582 <P2P message reader for rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007 unshared ordered sender uid=42 dom #1 local port=58695 remote port=52758> tid=0xcd] Error deserializing message
> java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading dsfid
> 	at org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397)
> 	at org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403)
> 	at org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979)
> 	at org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797)
> 	at org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651)
> 	at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)