You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vipul Thakur (Jira)" <ji...@apache.org> on 2020/07/25 06:27:00 UTC

[jira] [Created] (IGNITE-13298) Found long running cache at client end

Vipul Thakur created IGNITE-13298:
-------------------------------------

             Summary: Found long running cache at client end 
                 Key: IGNITE-13298
                 URL: https://issues.apache.org/jira/browse/IGNITE-13298
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 2.7.6
         Environment: ========cluster memory config/persistence================ 

<property name="gridLogger"> <property name="gridLogger">            <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">                <constructor-arg type="java.lang.String" value="${IGNITE_SCRIPT}/ignite-log4j2.xml" />            </bean>        </property>        <property name="dataStorageConfiguration">            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">                <property name="defaultDataRegionConfiguration">                    <bean class="org.apache.ignite.configuration.DataRegionConfiguration">                        <property name="metricsEnabled" value="true"/>                            <property name="persistenceEnabled" value="true" />                        <!--<property name="maxSize" value="#\{10L * 1024 * 1024 * 1024}"/> -->                        <property name="maxSize" value="400Gb" />                        <!-- Increasing the buffer size to 4 GB. -->                        <property name="checkpointPageBufferSize" value="${checkpointPageBufferSize}" />                    </bean>                </property>                <property name="storagePath" value="${storagePath}" />                <property name="walPath" value="${walPath}" />                <property name="walArchivePath" value="${walArchivePath}" />                <property name="walMode" value="LOG_ONLY" />                <property name="pageSize" value="${pageSize}" />                 <!-- Enable write throttling. -->                <property name="writeThrottlingEnabled" value="true" />                <property name="walHistorySize" value="1" />                <property name="metricsEnabled" value="true"/>            </bean>        </property>

==================Client thread dump ===========================

2020-07-20 12:14:432020-07-20 12:14:43Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.211-b12 mixed mode):
"Attach Listener" #788 daemon prio=9 os_prio=0 tid=0x00007fe7f4001000 nid=0x32d waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
   Locked ownable synchronizers: - None
"Context_6_jms_314_ConsumerDispatcher" #787 daemon prio=5 os_prio=0 tid=0x00007fe6e805e000 nid=0x31a waiting on condition [0x00007fe2e5bdd000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000cb87d9d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"DefaultMessageListenerContainer-35" #786 prio=5 os_prio=0 tid=0x00007fe460013800 nid=0x319 in Object.wait() [0x00007fe2e5cde000]   java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845) - locked <0x00000000cb8cce50> (a com.solacesystems.jcsmp.impl.XMLMessageQueueList) at com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) at org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86) at org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"Context_4_jms_313_ConsumerDispatcher" #785 daemon prio=5 os_prio=0 tid=0x00007fe6f8028000 nid=0x318 waiting on condition [0x00007fe2e5ddf000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000cb8cf8d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"DefaultMessageListenerContainer-27" #784 prio=5 os_prio=0 tid=0x00007fe45800f800 nid=0x317 in Object.wait() [0x00007fe2e5ee0000]   java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845) - locked <0x00000000cb8cffc8> (a com.solacesystems.jcsmp.impl.XMLMessageQueueList) at com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) at org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86) at org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"Context_6_jms_312_ConsumerDispatcher" #780 daemon prio=5 os_prio=0 tid=0x00007fe6e805c800 nid=0x313 waiting on condition [0x00007fe2e62e4000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000cb751ad0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None
"DefaultMessageListenerContainer-34" #779 prio=5 os_prio=0 tid=0x00007fe450003800 nid=0x312 waiting on condition [0x00007fe2e63e5000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get0(GridCacheAdapter.java:4723) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4697) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1415) at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:928) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:640) at com.jio.digitalapi.cacheservice.client.impl.DigitalApiIgniteCache.get(DigitalApiIgniteCache.java:87) at com.jio.digitalapi.eventprocessing.service.dataservice.EventManagementApacheIgniteDataService.getCustomerEntity(EventManagementApacheIgniteDataService.java:101) at com.jio.digitalapi.ep.dataservice.impl.AbstractEventManagementDataService.getCustomer(AbstractEventManagementDataService.java:38) at com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventActionProcessor.getCustomerEntity(AbstractMessageEventActionProcessor.java:154) at com.jio.digitalapi.eventprocessing.service.event.action.processor.PrimeMemberUpdateEventActionProcessor.processEvent(PrimeMemberUpdateEventActionProcessor.java:31) at com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventActionProcessor.processMessageEvent(AbstractMessageEventActionProcessor.java:112) at com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventHandler.processMessage(AbstractMessageEventHandler.java:66) at com.jio.digitalapi.platform.core.messaging.jms.receiver.DigitalApiAsyncJmsMessageReceiver.onMessage(DigitalApiAsyncJmsMessageReceiver.java:106) at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:761) at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:699) at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:674) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:318) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers: - None

 

 

 

 
            Reporter: Vipul Thakur
         Attachments: Ignite_10.143.75.24_threaddump.txt, Ignite_10.143.75.24_threaddump_1.txt, Ignite_10.143.75.24_threaddump_2.txt, Ignite_10.143.75.24_threaddump_3.txt, Ignite_10.143.75.24_threaddump_4.txt

Hi 

We have a ignite cluster with four nodes.(each having a memory of 400Gb). 

After deploying the cluster and clients in an environment after nearly 2 months our cluster gets hung up, initially few clients get stuck with some processing pending, and then after some duration everything gets stuck.

So after restarting everything(clients and cluster both) it works fine(process around 1 crore of records in 10-15 minutes involving creation of data and even updating the data.

 

We are using transactions for all the caches, for create and update and no transaction for get calls.

We have already faced this issue twice in a span of 4-5 months of deployment.

I am attaching the cluster thread dump and client thread dump.

I have seen the *Found long running caches* with one ticket already in Jira and moved to 2.8.1, so is that the solution(please confirm).

    2020-06-04 20:05:55.889 WARN 1 --- [c7fd8b84-d8sdl%] org.apache.ignite.internal.diagnostic : Found long running cache future [startTime=19:59:30.288, curTime=20:05:55.882, fut=GridPartitionedSingleGetFuture [topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0], key=UserKeyCacheObjectImpl [part=105, val=7701112105, hasValBytes=true], readThrough=true, forcePrimary=false, futId=681978f7271-55010ba8-d8d5-475f-97be-ff1c1916cea1, trackable=true, subjId=59d5e3cf-d09c-44d3-82d6-84dd35b64e10, taskName=null, deserializeBinary=true, skipVals=false, expiryPlc=null, canRemap=true, needVer=false, keepCacheObjects=false, recovery=false, node=TcpDiscoveryNode [id=14718baa-35e7-4d61-bde8-1e9c61978e8f, addrs=[10.135.34.67, 127.0.0.1], sockAddrs=[/10.135.34.67:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1591277613188, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], postProcessingClos=null]]

 

But this issue i have observed to come up in our scenario in environment also coming without any load or huge traffic.(my cluster just had 100 mb data).

 

We dont have any transaction timeout set as of now , should we go for that.

 

 

Thanks 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)