You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vipul Thakur (Jira)" <ji...@apache.org> on 2020/07/25 06:29:00 UTC
[jira] [Updated] (IGNITE-13298) Found long running cache at client
end
[ https://issues.apache.org/jira/browse/IGNITE-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vipul Thakur updated IGNITE-13298:
----------------------------------
Issue Type: Task (was: Bug)
Priority: Blocker (was: Major)
> Found long running cache at client end
> ---------------------------------------
>
> Key: IGNITE-13298
> URL: https://issues.apache.org/jira/browse/IGNITE-13298
> Project: Ignite
> Issue Type: Task
> Affects Versions: 2.7.6
> Environment: ========cluster memory config/persistence================
> <property name="gridLogger"> <property name="gridLogger"> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> <constructor-arg type="java.lang.String" value="${IGNITE_SCRIPT}/ignite-log4j2.xml" /> </bean> </property> <property name="dataStorageConfiguration"> <bean class="org.apache.ignite.configuration.DataStorageConfiguration"> <property name="defaultDataRegionConfiguration"> <bean class="org.apache.ignite.configuration.DataRegionConfiguration"> <property name="metricsEnabled" value="true"/> <property name="persistenceEnabled" value="true" /> <!--<property name="maxSize" value="#\{10L * 1024 * 1024 * 1024}"/> --> <property name="maxSize" value="400Gb" /> <!-- Increasing the buffer size to 4 GB. --> <property name="checkpointPageBufferSize" value="${checkpointPageBufferSize}" /> </bean> </property> <property name="storagePath" value="${storagePath}" /> <property name="walPath" value="${walPath}" /> <property name="walArchivePath" value="${walArchivePath}" /> <property name="walMode" value="LOG_ONLY" /> <property name="pageSize" value="${pageSize}" /> <!-- Enable write throttling. --> <property name="writeThrottlingEnabled" value="true" /> <property name="walHistorySize" value="1" /> <property name="metricsEnabled" value="true"/> </bean> </property>
> ==================Client thread dump ===========================
> 2020-07-20 12:14:432020-07-20 12:14:43Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.211-b12 mixed mode):
> "Attach Listener" #788 daemon prio=9 os_prio=0 tid=0x00007fe7f4001000 nid=0x32d waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE
> Locked ownable synchronizers: - None
> "Context_6_jms_314_ConsumerDispatcher" #787 daemon prio=5 os_prio=0 tid=0x00007fe6e805e000 nid=0x31a waiting on condition [0x00007fe2e5bdd000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000cb87d9d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers: - None
> "DefaultMessageListenerContainer-35" #786 prio=5 os_prio=0 tid=0x00007fe460013800 nid=0x319 in Object.wait() [0x00007fe2e5cde000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845) - locked <0x00000000cb8cce50> (a com.solacesystems.jcsmp.impl.XMLMessageQueueList) at com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) at org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86) at org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers: - None
> "Context_4_jms_313_ConsumerDispatcher" #785 daemon prio=5 os_prio=0 tid=0x00007fe6f8028000 nid=0x318 waiting on condition [0x00007fe2e5ddf000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000cb8cf8d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers: - None
> "DefaultMessageListenerContainer-27" #784 prio=5 os_prio=0 tid=0x00007fe45800f800 nid=0x317 in Object.wait() [0x00007fe2e5ee0000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845) - locked <0x00000000cb8cffc8> (a com.solacesystems.jcsmp.impl.XMLMessageQueueList) at com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) at org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86) at org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers: - None
> "Context_6_jms_312_ConsumerDispatcher" #780 daemon prio=5 os_prio=0 tid=0x00007fe6e805c800 nid=0x313 waiting on condition [0x00007fe2e62e4000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000cb751ad0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers: - None
> "DefaultMessageListenerContainer-34" #779 prio=5 os_prio=0 tid=0x00007fe450003800 nid=0x312 waiting on condition [0x00007fe2e63e5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get0(GridCacheAdapter.java:4723) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4697) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1415) at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:928) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:640) at com.jio.digitalapi.cacheservice.client.impl.DigitalApiIgniteCache.get(DigitalApiIgniteCache.java:87) at com.jio.digitalapi.eventprocessing.service.dataservice.EventManagementApacheIgniteDataService.getCustomerEntity(EventManagementApacheIgniteDataService.java:101) at com.jio.digitalapi.ep.dataservice.impl.AbstractEventManagementDataService.getCustomer(AbstractEventManagementDataService.java:38) at com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventActionProcessor.getCustomerEntity(AbstractMessageEventActionProcessor.java:154) at com.jio.digitalapi.eventprocessing.service.event.action.processor.PrimeMemberUpdateEventActionProcessor.processEvent(PrimeMemberUpdateEventActionProcessor.java:31) at com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventActionProcessor.processMessageEvent(AbstractMessageEventActionProcessor.java:112) at com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventHandler.processMessage(AbstractMessageEventHandler.java:66) at com.jio.digitalapi.platform.core.messaging.jms.receiver.DigitalApiAsyncJmsMessageReceiver.onMessage(DigitalApiAsyncJmsMessageReceiver.java:106) at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:761) at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:699) at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:674) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:318) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers: - None
>
>
>
>
> Reporter: Vipul Thakur
> Priority: Blocker
> Attachments: Ignite_10.143.75.24_threaddump.txt, Ignite_10.143.75.24_threaddump_1.txt, Ignite_10.143.75.24_threaddump_2.txt, Ignite_10.143.75.24_threaddump_3.txt, Ignite_10.143.75.24_threaddump_4.txt
>
>
> Hi
> We have a ignite cluster with four nodes.(each having a memory of 400Gb).
> After deploying the cluster and clients in an environment after nearly 2 months our cluster gets hung up, initially few clients get stuck with some processing pending, and then after some duration everything gets stuck.
> So after restarting everything(clients and cluster both) it works fine(process around 1 crore of records in 10-15 minutes involving creation of data and even updating the data.
>
> We are using transactions for all the caches, for create and update and no transaction for get calls.
> We have already faced this issue twice in a span of 4-5 months of deployment.
> I am attaching the cluster thread dump and client thread dump.
> I have seen the *Found long running caches* with one ticket already in Jira and moved to 2.8.1, so is that the solution(please confirm).
> 2020-06-04 20:05:55.889 WARN 1 --- [c7fd8b84-d8sdl%] org.apache.ignite.internal.diagnostic : Found long running cache future [startTime=19:59:30.288, curTime=20:05:55.882, fut=GridPartitionedSingleGetFuture [topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0], key=UserKeyCacheObjectImpl [part=105, val=7701112105, hasValBytes=true], readThrough=true, forcePrimary=false, futId=681978f7271-55010ba8-d8d5-475f-97be-ff1c1916cea1, trackable=true, subjId=59d5e3cf-d09c-44d3-82d6-84dd35b64e10, taskName=null, deserializeBinary=true, skipVals=false, expiryPlc=null, canRemap=true, needVer=false, keepCacheObjects=false, recovery=false, node=TcpDiscoveryNode [id=14718baa-35e7-4d61-bde8-1e9c61978e8f, addrs=[10.135.34.67, 127.0.0.1], sockAddrs=[/10.135.34.67:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1591277613188, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], postProcessingClos=null]]
>
> But this issue i have observed to come up in our scenario in environment also coming without any load or huge traffic.(my cluster just had 100 mb data).
>
> We dont have any transaction timeout set as of now , should we go for that.
>
>
> Thanks
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)