You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Paul Jose <pa...@ugamsolutions.com> on 2021/08/25 22:27:25 UTC

Thread stuck when accessing atomic reference or long on Apache Ignite

Hi all,
This is regarding a rather recent issue that we’ve been facing. We run 2 client instances and 26 apache ignite instances. All are AWS R4.2xLarge nodes. Recently we’ve been seeing this issue where when trying to fetch an atomicLong or atomicReference, the executing thread gets stuck and doesn’t return. This issue usually happens on 1 or 2 ignite instances. I am not sure why this happens and so any help on this would be really appreciated. The version of Ignite we use is 2.7.5
This is the thread dump while trying to get an atomicReference:
"main" #1 prio=5 os_prio=0 cpu=3528.41ms elapsed=1067.33s allocated=312M defined_classes=9309 tid=0x00007f4ce4046fc0 nid=0x1537 waiting on condition  [0x00007f4cece90000]
   java.lang.Thread.State: WAITING (parking)
                at jdk.internal.misc.Unsafe.park(java.base@11.0.7/Native Method)
                - parking to wait for  <0x00007f4cbfe7c7d0> (a java.util.concurrent.CountDownLatch$Sync)
                at java.util.concurrent.locks.LockSupport.park(java.base@11.0.7/LockSupport.java:194)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.7/AbstractQueuedSynchronizer.java:885)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.7/AbstractQueuedSynchronizer.java:1039)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.7/AbstractQueuedSynchronizer.java:1345)
                at java.util.concurrent.CountDownLatch.await(java.base@11.0.7/CountDownLatch.java:232)
                at org.apache.ignite.internal.util.IgniteUtils.await(IgniteUtils.java:7612)
                at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.awaitInitialization(DataStructuresProcessor.java:1147)
                at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.getAtomic(DataStructuresProcessor.java:506)
                at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.atomicReference(DataStructuresProcessor.java:744)
                at org.apache.ignite.internal.IgniteKernal.atomicReference(IgniteKernal.java:3743)
                at org.apache.ignite.internal.IgniteKernal.atomicReference(IgniteKernal.java:3732)
                at company.explore.cache.persist.SavedAudienceLocationProvider.getSavedAudienceLocation(SavedAudienceLocationProvider.java:27)
                at company.explore.listeners.lifecycle.LifecycleListener.configureSavedAudienceLocation(LifecycleListener.java:45)
                at company.explore.listeners.lifecycle.LifecycleListener.onLifecycleEvent(LifecycleListener.java:38)
                at org.apache.ignite.internal.IgniteKernal.notifyLifecycleBeans(IgniteKernal.java:725)
                at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1156)
                at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2038)
                at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1730)
                - locked <0x00007f4cbf072a38> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
                at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1158)
                at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1076)
                at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:962)
                at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:861)
                at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:731)
                at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:700)
                at org.apache.ignite.Ignition.start(Ignition.java:348)
                at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)

Since this is stuck any Ignition.ignite calls fail as well and cause the job not to go through:
"pub-#22" #48 prio=5 os_prio=0 cpu=5.76ms elapsed=1036.50s allocated=421K defined_classes=6 tid=0x00007f4ce4cf3990 nid=0x1607 waiting on condition  [0x00007f40375f6000]
   java.lang.Thread.State: WAITING (parking)
                at jdk.internal.misc.Unsafe.park(java.base@11.0.7/Native Method)
                - parking to wait for  <0x00007f4cbf16d9e0> (a java.util.concurrent.CountDownLatch$Sync)
                at java.util.concurrent.locks.LockSupport.park(java.base@11.0.7/LockSupport.java:194)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.7/AbstractQueuedSynchronizer.java:885)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.7/AbstractQueuedSynchronizer.java:1039)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.7/AbstractQueuedSynchronizer.java:1345)
                at java.util.concurrent.CountDownLatch.await(java.base@11.0.7/CountDownLatch.java:232)
                at org.apache.ignite.internal.util.IgniteUtils.awaitQuiet(IgniteUtils.java:7657)
                at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.grid(IgnitionEx.java:1671)
                at org.apache.ignite.internal.IgnitionEx.grid(IgnitionEx.java:1389)
                at org.apache.ignite.internal.IgnitionEx.grid(IgnitionEx.java:1258)
                at org.apache.ignite.Ignition.ignite(Ignition.java:489)
                at company.explore.dataload.person.LoadPersonAttributeJob.call(LoadPersonAttributeJob.java:58)
                at company.explore.dataload.person.LoadPersonAttributeJob.call(LoadPersonAttributeJob.java:31)
                at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C2.execute(GridClosureProcessor.java:1855)
                at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:568)
                at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6817)
                at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:562)
                at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491)
                at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.7/ThreadPoolExecutor.java:1128)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.7/ThreadPoolExecutor.java:628)
                at java.lang.Thread.run(java.base@11.0.7/Thread.java:834)

Similarly this is an instance where the thread is waiting for CountDownLatch when trying to get atomicLong:
"pub-#489" #608 prio=5 os_prio=0 cpu=16.80ms elapsed=7076.10s allocated=2409K defined_classes=17 tid=0x00007f48c8014c60 nid=0x5bd5 waiting on condition  [0x00007f48359e1000]
   java.lang.Thread.State: WAITING (parking)
                at jdk.internal.misc.Unsafe.park(java.base@11.0.7/Native Method)
                - parking to wait for  <0x00007f518aba6060> (a java.util.concurrent.CountDownLatch$Sync)
                at java.util.concurrent.locks.LockSupport.park(java.base@11.0.7/LockSupport.java:194)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.7/AbstractQueuedSynchronizer.java:885)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.7/AbstractQueuedSynchronizer.java:1039)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.7/AbstractQueuedSynchronizer.java:1345)
                at java.util.concurrent.CountDownLatch.await(java.base@11.0.7/CountDownLatch.java:232)
                at org.apache.ignite.internal.util.IgniteUtils.await(IgniteUtils.java:7612)
                at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.awaitInitialization(DataStructuresProcessor.java:1147)
                at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.getAtomic(DataStructuresProcessor.java:506)
                at org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor.atomicLong(DataStructuresProcessor.java:463)
                at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3716)
                at org.apache.ignite.internal.IgniteKernal.atomicLong(IgniteKernal.java:3705)
                at company.explore.cache.persist.person.SerializationStatus.getSerializeCounter(SerializationStatus.java:86)
                at company.explore.cache.persist.person.SerializationStatus.startNodeSerialization(SerializationStatus.java:21)
                at company.explore.cache.persist.personv2.PersonSerializationJob.serializePeopleData(PersonSerializationJob.java:98)
                at company.explore.cache.persist.personv2.PersonSerializationJob.run(PersonSerializationJob.java:75)
                at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4.execute(GridClosureProcessor.java:1944)
                at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:568)
                at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6817)
                at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:562)
                at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:491)
                at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.7/ThreadPoolExecutor.java:1128)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.7/ThreadPoolExecutor.java:628)
                at java.lang.Thread.run(java.base@11.0.7/Thread.java:834)

These issues have only started coming up as of the past 2 months or so. The system itself has been very stable for a long time. I haven’t posted the entire thread dump as it would be quite large. If needed, I can post it on pastebin or upload it somewhere.
Since this really isn’t a very consistent issue I am not sure about how to create a reproducer project. But I can provide any logs or so if needed.
The entire thread dumps have been posted on pastebin. Please find the links below:
Atomic Reference related thread dump: pastebin.com/ydNMFSEP
Atomic Long related thread dump: pastebin.com/psJgwi3F
Any help is much appreciated. Thanks!
Best Regards,
Paul
---------------------------------------------------------------------------------------Disclaimer---------------------------------------------------------------------------------------------- 

****Views and opinions expressed in this e-mail belong to  their author and do not necessarily represent views and opinions  of Ugam. 
Our employees are obliged not to make any defamatory statement or infringe any legal right. 
Therefore, Ugam does not accept any responsibility or liability for such statements. The content of this email is confidential and intended for the recipient specified in message only. It is strictly forbidden to share any part of this message with any third party, without a written consent of the sender.
If you have received this message by mistake, please reply to this message and follow with its deletion, so that we can ensure such a mistake does not occur in the future. 
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses as full security of the email cannot be ensured despite our best efforts.
Therefore, Ugam accepts no liability for any damage inflicted by viewing the content of this email.. ****

Please do not print this email unless it is necessary. Every unprinted email helps the environment.