You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vipul Thakur (Jira)" <ji...@apache.org> on 2020/07/25 06:29:00 UTC

[jira] [Updated] (IGNITE-13298) Found long running cache at client end

     [ https://issues.apache.org/jira/browse/IGNITE-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vipul Thakur updated IGNITE-13298:
----------------------------------
    Issue Type: Task  (was: Bug)
      Priority: Blocker  (was: Major)

> Found long running cache at client end 
> ---------------------------------------
>
>                 Key: IGNITE-13298
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13298
>             Project: Ignite
>          Issue Type: Task
>    Affects Versions: 2.7.6
>         Environment: ========cluster memory config/persistence================ 
> <property name="gridLogger"> <property name="gridLogger">            <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">                <constructor-arg type="java.lang.String" value="${IGNITE_SCRIPT}/ignite-log4j2.xml" />            </bean>        </property>        <property name="dataStorageConfiguration">            <bean class="org.apache.ignite.configuration.DataStorageConfiguration">                <property name="defaultDataRegionConfiguration">                    <bean class="org.apache.ignite.configuration.DataRegionConfiguration">                        <property name="metricsEnabled" value="true"/>                            <property name="persistenceEnabled" value="true" />                        <!--<property name="maxSize" value="#\{10L * 1024 * 1024 * 1024}"/> -->                        <property name="maxSize" value="400Gb" />                        <!-- Increasing the buffer size to 4 GB. -->                        <property name="checkpointPageBufferSize" value="${checkpointPageBufferSize}" />                    </bean>                </property>                <property name="storagePath" value="${storagePath}" />                <property name="walPath" value="${walPath}" />                <property name="walArchivePath" value="${walArchivePath}" />                <property name="walMode" value="LOG_ONLY" />                <property name="pageSize" value="${pageSize}" />                 <!-- Enable write throttling. -->                <property name="writeThrottlingEnabled" value="true" />                <property name="walHistorySize" value="1" />                <property name="metricsEnabled" value="true"/>            </bean>        </property>
> ==================Client thread dump ===========================
> 2020-07-20 12:14:432020-07-20 12:14:43Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.211-b12 mixed mode):
> "Attach Listener" #788 daemon prio=9 os_prio=0 tid=0x00007fe7f4001000 nid=0x32d waiting on condition [0x0000000000000000]   java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers: - None
> "Context_6_jms_314_ConsumerDispatcher" #787 daemon prio=5 os_prio=0 tid=0x00007fe6e805e000 nid=0x31a waiting on condition [0x00007fe2e5bdd000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000cb87d9d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers: - None
> "DefaultMessageListenerContainer-35" #786 prio=5 os_prio=0 tid=0x00007fe460013800 nid=0x319 in Object.wait() [0x00007fe2e5cde000]   java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845) - locked <0x00000000cb8cce50> (a com.solacesystems.jcsmp.impl.XMLMessageQueueList) at com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) at org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86) at org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers: - None
> "Context_4_jms_313_ConsumerDispatcher" #785 daemon prio=5 os_prio=0 tid=0x00007fe6f8028000 nid=0x318 waiting on condition [0x00007fe2e5ddf000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000cb8cf8d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers: - None
> "DefaultMessageListenerContainer-27" #784 prio=5 os_prio=0 tid=0x00007fe45800f800 nid=0x317 in Object.wait() [0x00007fe2e5ee0000]   java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at com.solacesystems.jcsmp.impl.XMLMessageQueue.dequeue(XMLMessageQueue.java:130) at com.solacesystems.jcsmp.impl.flow.FlowHandleImpl.receive(FlowHandleImpl.java:845) - locked <0x00000000cb8cffc8> (a com.solacesystems.jcsmp.impl.XMLMessageQueueList) at com.solacesystems.jms.SolMessageConsumer.receive(SolMessageConsumer.java:253) at org.springframework.jms.connection.CachedMessageConsumer.receive(CachedMessageConsumer.java:86) at org.springframework.jms.support.destination.JmsDestinationAccessor.receiveFromConsumer(JmsDestinationAccessor.java:132) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveMessage(AbstractPollingMessageListenerContainer.java:418) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:303) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers: - None
> "Context_6_jms_312_ConsumerDispatcher" #780 daemon prio=5 os_prio=0 tid=0x00007fe6e805c800 nid=0x313 waiting on condition [0x00007fe2e62e4000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x00000000cb751ad0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.eventLoop(ConsumerNotificationDispatcher.java:110) at com.solacesystems.jcsmp.protocol.nio.impl.ConsumerNotificationDispatcher.run(ConsumerNotificationDispatcher.java:130) at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers: - None
> "DefaultMessageListenerContainer-34" #779 prio=5 os_prio=0 tid=0x00007fe450003800 nid=0x312 waiting on condition [0x00007fe2e63e5000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get0(GridCacheAdapter.java:4723) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4697) at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1415) at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:928) at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:640) at com.jio.digitalapi.cacheservice.client.impl.DigitalApiIgniteCache.get(DigitalApiIgniteCache.java:87) at com.jio.digitalapi.eventprocessing.service.dataservice.EventManagementApacheIgniteDataService.getCustomerEntity(EventManagementApacheIgniteDataService.java:101) at com.jio.digitalapi.ep.dataservice.impl.AbstractEventManagementDataService.getCustomer(AbstractEventManagementDataService.java:38) at com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventActionProcessor.getCustomerEntity(AbstractMessageEventActionProcessor.java:154) at com.jio.digitalapi.eventprocessing.service.event.action.processor.PrimeMemberUpdateEventActionProcessor.processEvent(PrimeMemberUpdateEventActionProcessor.java:31) at com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventActionProcessor.processMessageEvent(AbstractMessageEventActionProcessor.java:112) at com.jio.digitalapi.eventprocessing.service.base.AbstractMessageEventHandler.processMessage(AbstractMessageEventHandler.java:66) at com.jio.digitalapi.platform.core.messaging.jms.receiver.DigitalApiAsyncJmsMessageReceiver.onMessage(DigitalApiAsyncJmsMessageReceiver.java:106) at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:761) at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:699) at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:674) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:318) at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:257) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1189) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1179) at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1076) at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers: - None
>  
>  
>  
>  
>            Reporter: Vipul Thakur
>            Priority: Blocker
>         Attachments: Ignite_10.143.75.24_threaddump.txt, Ignite_10.143.75.24_threaddump_1.txt, Ignite_10.143.75.24_threaddump_2.txt, Ignite_10.143.75.24_threaddump_3.txt, Ignite_10.143.75.24_threaddump_4.txt
>
>
> Hi 
> We have a ignite cluster with four nodes.(each having a memory of 400Gb). 
> After deploying the cluster and clients in an environment after nearly 2 months our cluster gets hung up, initially few clients get stuck with some processing pending, and then after some duration everything gets stuck.
> So after restarting everything(clients and cluster both) it works fine(process around 1 crore of records in 10-15 minutes involving creation of data and even updating the data.
>  
> We are using transactions for all the caches, for create and update and no transaction for get calls.
> We have already faced this issue twice in a span of 4-5 months of deployment.
> I am attaching the cluster thread dump and client thread dump.
> I have seen the *Found long running caches* with one ticket already in Jira and moved to 2.8.1, so is that the solution(please confirm).
>     2020-06-04 20:05:55.889 WARN 1 --- [c7fd8b84-d8sdl%] org.apache.ignite.internal.diagnostic : Found long running cache future [startTime=19:59:30.288, curTime=20:05:55.882, fut=GridPartitionedSingleGetFuture [topVer=AffinityTopologyVersion [topVer=10, minorTopVer=0], key=UserKeyCacheObjectImpl [part=105, val=7701112105, hasValBytes=true], readThrough=true, forcePrimary=false, futId=681978f7271-55010ba8-d8d5-475f-97be-ff1c1916cea1, trackable=true, subjId=59d5e3cf-d09c-44d3-82d6-84dd35b64e10, taskName=null, deserializeBinary=true, skipVals=false, expiryPlc=null, canRemap=true, needVer=false, keepCacheObjects=false, recovery=false, node=TcpDiscoveryNode [id=14718baa-35e7-4d61-bde8-1e9c61978e8f, addrs=[10.135.34.67, 127.0.0.1], sockAddrs=[/10.135.34.67:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1591277613188, loc=false, ver=2.7.6#20190911-sha1:21f7ca41, isClient=false], postProcessingClos=null]]
>  
> But this issue i have observed to come up in our scenario in environment also coming without any load or huge traffic.(my cluster just had 100 mb data).
>  
> We dont have any transaction timeout set as of now , should we go for that.
>  
>  
> Thanks 
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)