You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Geode Integration (Jira)" <ji...@apache.org> on 2022/01/28 00:51:00 UTC

[jira] [Commented] (GEODE-8204) CI Failure: RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.luceneQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion hang

    [ https://issues.apache.org/jira/browse/GEODE-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483495#comment-17483495 ] 

Geode Integration commented on GEODE-8204:
------------------------------------------

Seen on support/1.15 in [upgrade-test-openjdk11 #1|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-support-1-15-main/jobs/upgrade-test-openjdk11/builds/1] ... see [test results|http://files.apachegeode-ci.info/builds/apache-support-1-15-main/1.15.0-build.0831/test-results/upgradeTest/1643239918/] or download [artifacts|http://files.apachegeode-ci.info/builds/apache-support-1-15-main/1.15.0-build.0831/test-artifacts/1643239918/upgradetestfiles-openjdk11-1.15.0-build.0831.tgz].

> CI Failure: RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.luceneQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion hang
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-8204
>                 URL: https://issues.apache.org/jira/browse/GEODE-8204
>             Project: Geode
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Eric Shu
>            Priority: Major
>              Labels: caching-applications, ci
>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK8/builds/217#A run timed out.
> The dunit-hangs.txt shows two tests not finished, and I think the ClientServerMiscBCDUnitTest giiEventQueueFromOldToCurrentMemberShouldSucceed is not really hang. It is just a little hard to read the stack dumps as they are for both test runs.
> Started @ 2020-05-29 14:11:25.687 +0000
> 2020-05-29 14:59:03.911 +0000  org.apache.geode.internal.cache.tier.sockets.ClientServerMiscBCDUnitTest giiEventQueueFromOldToCurrentMemberShouldSucceed[12]
> Ended @ 2020-05-29 14:59:03.911 +0000
> Started @ 2020-05-29 14:11:25.635 +0000
> 2020-05-29 14:46:47.615 +0000  org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion luceneQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion[from_v1.9.2,
> Ended @ 2020-05-29 14:56:25.635 +0000
> There are multiple DLock requesters for LockGrantor during PartitionedRepositoryManager.computeRepository().
> {noformat}
> "Pooled Waiting Message Processor 40" #163 daemon prio=5 os_prio=0 tid=0x00007fc94c043000 nid=0x955c waiting on condition [0x00007fc9d07ad000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e351c5c0> (a java.util.concurrent.CountDownLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
>         at org.apache.geode.internal.util.concurrent.FutureResult.get(FutureResult.java:69)
>         at org.apache.geode.distributed.internal.locks.DLockService.waitForLockGrantorFutureResult(DLockService.java:750)
>         at org.apache.geode.distributed.internal.locks.DLockService.getLockGrantorId(DLockService.java:301)
>         at org.apache.geode.distributed.internal.locks.DLockService.lockInterruptibly(DLockService.java:1445)
>         at org.apache.geode.distributed.internal.locks.DLockService.lock(DLockService.java:1241)
>         at org.apache.geode.distributed.internal.locks.DLockService.lock(DLockService.java:1232)
>         at org.apache.geode.distributed.internal.locks.DLockService.lock(DLockService.java:1227)
>         at org.apache.geode.distributed.internal.locks.DLockService.lock(DLockService.java:1223)
>         at org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finishComputingRepository(IndexRepositoryFactory.java:105)
>         at org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIndexRepository(IndexRepositoryFactory.java:65)
>         at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.computeRepository(PartitionedRepositoryManager.java:151)
>         at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.lambda$computeRepository$1(PartitionedRepositoryManager.java:170)
>         at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager$$Lambda$308/437574105.apply(Unknown Source)
>         at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
>         - locked <0x00000000e35149f0> (a java.util.concurrent.ConcurrentHashMap$ReservationNode)
>         at org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.computeRepository(PartitionedRepositoryManager.java:162)
>         at org.apache.geode.cache.lucene.internal.LuceneBucketListener.lambda$afterPrimary$0(LuceneBucketListener.java:40)
>         at org.apache.geode.cache.lucene.internal.LuceneBucketListener$$Lambda$307/1978619753.run(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:442)
>         at org.apache.geode.distributed.internal.ClusterOperationExecutors.doWaitingThread(ClusterOperationExecutors.java:411)
>         at org.apache.geode.distributed.internal.ClusterOperationExecutors$$Lambda$86/1114445410.invoke(Unknown Source)
>         at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
>         at org.apache.geode.logging.internal.executors.LoggingThreadFactory$$Lambda$82/1488044985.run(Unknown Source)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000e3596b58> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> The remote process that should processing the requests are being closed and is wait for GrantorRequestProcessor.clearGrantor to finish:
> {noformat}
> "RMI TCP Connection(1)-172.17.0.30" #32 daemon prio=5 os_prio=0 tid=0x00007f8178001800 nid=0x939b waiting on condition [0x00007f826f269000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000e1982990> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
>         at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)
>         at org.apache.geode.internal.util.concurrent.StoppableReentrantLock.lockInterruptibly(StoppableReentrantLock.java:85)
>         at org.apache.geode.internal.util.concurrent.StoppableReentrantLock.lock(StoppableReentrantLock.java:71)
>         at org.apache.geode.distributed.internal.ClusterElderManager.initializeElderState(ClusterElderManager.java:102)
>         at org.apache.geode.distributed.internal.ClusterElderManager.getElderState(ClusterElderManager.java:97)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.getElderState(ClusterDistributionManager.java:2091)
>         at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.startElderCall(GrantorRequestProcessor.java:227)
>         at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:336)
>         at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:319)
>         at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.clearGrantor(GrantorRequestProcessor.java:310)
>         at org.apache.geode.distributed.internal.locks.DLockService.clearGrantor(DLockService.java:874)
>         at org.apache.geode.distributed.internal.locks.DLockGrantor.destroy(DLockGrantor.java:1249)
>         - locked <0x00000000fbd65f58> (a org.apache.geode.distributed.internal.locks.DLockGrantor)
>         at org.apache.geode.distributed.internal.locks.DLockService.nullLockGrantorId(DLockService.java:649)
>         at org.apache.geode.distributed.internal.locks.DLockService.basicDestroy(DLockService.java:2336)
>         at org.apache.geode.distributed.internal.locks.DLockService.destroyAndRemove(DLockService.java:2256)
>         - locked <0x00000000fbc85130> (a java.lang.Object)
>         at org.apache.geode.distributed.internal.locks.DLockService.destroyServiceNamed(DLockService.java:2195)
>         at org.apache.geode.distributed.DistributedLockService.destroy(DistributedLockService.java:84)
>         at org.apache.geode.internal.cache.GemFireCacheImpl.destroyPartitionedRegionLockService(GemFireCacheImpl.java:2042)
>         at org.apache.geode.internal.cache.GemFireCacheImpl.doClose(GemFireCacheImpl.java:2296)
>         - locked <0x00000000e2282030> (a java.lang.Class for org.apache.geode.internal.cache.GemFireCacheImpl)
>         at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2117)
>         at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1985)
>         at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1975)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeTestBase.closeCache(LuceneSearchWithRollingUpgradeTestBase.java:755)
>         at org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeTestBase$12.run2(LuceneSearchWithRollingUpgradeTestBase.java:568)
>         at org.apache.geode.cache30.CacheSerializableRunnable.run(CacheSerializableRunnable.java:55)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
>         at org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:78)
>         at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>         at sun.rmi.transport.Transport$1.run(Transport.java:200)
>         at sun.rmi.transport.Transport$1.run(Transport.java:197)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>         at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$15/1032579838.run(Unknown Source)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000fb3d0db0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
>         - <0x00000000fbec63a8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> {noformat}
> The thread holding the lock is still in ElderInitProcessor.init. I can not tell why it is not able to finish it.
> {noformat}
> "Pooled Message Processor 2" #124 daemon prio=10 os_prio=0 tid=0x00007f8100034000 nid=0x9401 waiting on condition [0x00007f82648cc000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00000000fb403e98> (a java.util.concurrent.CountDownLatch$Sync)
>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:731)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:802)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:779)
>         at org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:865)
>         at org.apache.geode.distributed.internal.locks.ElderInitProcessor.init(ElderInitProcessor.java:71)
>         at org.apache.geode.distributed.internal.locks.ElderState.<init>(ElderState.java:53)
>         at org.apache.geode.distributed.internal.ClusterElderManager.lambda$new$0(ClusterElderManager.java:41)
>         at org.apache.geode.distributed.internal.ClusterElderManager$$Lambda$76/1497552839.get(Unknown Source)
>         at org.apache.geode.distributed.internal.ClusterElderManager.initializeElderState(ClusterElderManager.java:106)
>         at org.apache.geode.distributed.internal.ClusterElderManager.getElderState(ClusterElderManager.java:97)
>         at org.apache.geode.distributed.internal.ClusterDistributionManager.getElderState(ClusterDistributionManager.java:2091)
>         at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor$GrantorRequestMessage.basicProcess(GrantorRequestProcessor.java:504)
>         at org.apache.geode.distributed.internal.locks.GrantorRequestProcessor$GrantorRequestMessage.process(GrantorRequestProcessor.java:497)
>         at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:376)
>         at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:441)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at org.apache.geode.distributed.internal.ClusterOperationExecutors.runUntilShutdown(ClusterOperationExecutors.java:442)
>         at org.apache.geode.distributed.internal.ClusterOperationExecutors.doProcessingThread(ClusterOperationExecutors.java:389)
>         at org.apache.geode.distributed.internal.ClusterOperationExecutors$$Lambda$81/548873202.invoke(Unknown Source)
>         at org.apache.geode.logging.internal.executors.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:119)
>         at org.apache.geode.logging.internal.executors.LoggingThreadFactory$$Lambda$82/1455748764.run(Unknown Source)
>         at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
>         - <0x00000000fb437ee0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
>         - <0x00000000e1982990> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)