You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Justin Ji <bi...@foxmail.com> on 2018/09/07 02:13:28 UTC
The system cache size was slowly increased
Hi all -
We use Ignite in our production environment, But I found that the system
cache was increased slowly and never reclaim. when the free system memory
less than 200M, the node seemed did not work anymore and our system cannot
get any response from the server nodes. The image below is our server's
monitoring data:
<http://apache-ignite-users.70518.x6.nabble.com/file/t2000/WechatIMG86.jpeg>
Our server nodes configuration is:
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/util
http://www.springframework.org/schema/util/spring-util.xsd">
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="discoverySpi">
<bean
class="org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi">
<property name="zkConnectionString"
value="172.31.34.133:2181,172.31.32.111:2181,172.31.37.6:2181"/>
<property name="sessionTimeout" value="30000"/>
<property name="zkRootPath" value="/ignite/discovery"/>
<property name="joinTimeout" value="10000"/>
</bean>
</property>
<property name="gridLogger">
<bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
<constructor-arg type="java.lang.String" value="log4j2.xml"/>
</bean>
</property>
<property name="communicationSpi">
<bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
<property name="localPort" value="47174"/>
<property name="messageQueueLimit" value="1024"/>
</bean>
</property>
<property name="dataStorageConfiguration">
<bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
<property name="initialSize" value="#{512L * 1024 *
1024}"/>
<property name="maxSize" value="#{3L * 1024 * 1024 *
1024}"/>
</bean>
</property>
<property name="writeThrottlingEnabled" value="true"/>
<property name="storagePath"
value="/var/lib/ignite/persistence"/>
<property name="walPath" value="/wal"/>
<property name="walArchivePath" value="/wal/archive"/>
</bean>
</property>
<property name="includeEventTypes">
<list>
<util:constant
static-field="org.apache.ignite.events.EventType.EVT_TASK_STARTED"/>
<util:constant
static-field="org.apache.ignite.events.EventType.EVT_TASK_FINISHED"/>
<util:constant
static-field="org.apache.ignite.events.EventType.EVT_TASK_FAILED"/>
</list>
</property>
</bean>
</beans>
and the client nodes cache configuration is:
TcpCommunicationSpi communicationSpi =
DefaultIgniteConfiguration.getTcpCommunicationSpi(ignitePort);
cfg.setCommunicationSpi(communicationSpi);
//设备缓存配置
//BinaryObject 即com.tuya.athena.ignite.domain.DeviceStatusIgniteVO
CacheConfiguration<String, BinaryObject> cacheCfg = new
CacheConfiguration<>();
cacheCfg.setName("device_status");
//分区存储
cacheCfg.setCacheMode(CacheMode.PARTITIONED);
//backup count
cacheCfg.setBackups(1);
cacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
cacheCfg.setCacheStoreFactory(FactoryBuilder.factoryOf(DeviceStatusCacheStore.class));
cacheCfg.setWriteThrough(true);
cacheCfg.setWriteBehindEnabled(true);
//flush every minutes
cacheCfg.setWriteBehindFlushFrequency(60 * 1000);
cacheCfg.setWriteBehindBatchSize(1024);
cacheCfg.setStoreKeepBinary(true);
cfg.setCacheConfiguration(cacheCfg);
ignite = Ignition.getOrStart(cfg);
ignite.cluster().active(true);
Is there any inappropriate place in my configuration? Looking forward to
your reply.
PS:
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: The system cache size was slowly increased
Posted by Justin Ji <bi...@foxmail.com>.
I analyzed the log and found that there are many *checkpoint timeout*. The
following are the segments of the checkpoint:
[db-checkpoint-thread-#46] INFO
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Checkpoint started
[checkpointId=7fdb8ddd-76d7-4ddf-9fd1-cddd8bb093be, startPtr=FileWALPointer
[idx=593, fileOff=30559855, len=15120], checkpointLockWait=0ms,
checkpointLockHoldTime=13ms, walCpRecordFsyncDuration=15ms, pages=37037,
reason='timeout']
[db-checkpoint-thread-#46] INFO
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Checkpoint finished
[cpId=7fdb8ddd-76d7-4ddf-9fd1-cddd8bb093be, pages=37037,
markPos=FileWALPointer [idx=593, fileOff=30559855, len=15120],
walSegmentsCleared=0, markDuration=36ms, pagesWrite=318ms, fsync=4370ms,
total=4724ms]
[db-checkpoint-thread-#46] INFO
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Checkpoint started
[checkpointId=1078e60f-eb68-4cac-b7a3-15e6ca83dbb6, startPtr=FileWALPointer
[idx=593, fileOff=39744249, len=15120], checkpointLockWait=0ms,
checkpointLockHoldTime=6ms, walCpRecordFsyncDuration=7ms, pages=2145,
reason='timeout']
2018-09-06 08:33:23:909 [db-checkpoint-thread-#46] INFO
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Checkpoint finished
[cpId=1078e60f-eb68-4cac-b7a3-15e6ca83dbb6, pages=2145,
markPos=FileWALPointer [idx=593, fileOff=39744249, len=15120],
walSegmentsCleared=1, markDuration=15ms, pagesWrite=20ms, fsync=372ms,
total=407ms]
[db-checkpoint-thread-#46] INFO
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Skipping checkpoint (no
pages were modified) [checkpointLockWait=0ms, checkpointLockHoldTime=1ms,
reason='timeout']
[db-checkpoint-thread-#46] INFO
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Skipping checkpoint (no
pages were modified) [checkpointLockWait=0ms, checkpointLockHoldTime=1ms,
reason='timeout']
*So, can I assume that the culprit in cache rise is the checkpoint?*
If true, how can I deal with it?
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: The system cache size was slowly increased
Posted by Justin Ji <bi...@foxmail.com>.
Who can give me some advice?
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: The system cache size was slowly increased
Posted by Justin Ji <bi...@foxmail.com>.
The second question:
I run the ignite nodes in the docker container with the following command:
sudo -u docker docker run -v /mnt/logs/apps/ignite:/mnt/logs/apps/ignite -v
/opt/ignite/ext-libs:/opt/ignite/ext-libs -v
/opt/ignite/config:/opt/ignite/config -v
/var/lib/ignite/persistence:/var/lib/ignite/persistence --name ignite
--net=host -e
"CONFIG_URI=file:///opt/ignite/config/ignite-config-prod-us.xml" -e
"OPTION_LIBS=ignite-zookeeper,ignite-indexing,ignite-log4j2,ignite-rest-http"
-e "JVM_OPTS=-Xms2g -Xmx2g -XX:+AlwaysPreTouch -XX:+UseG1GC
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/heapdump/ignite
-XX:+ExitOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=100M -Xloggc:/mnt/logs/apps/ignite/gc.log" -e
"EXTERNAL_LIBS=http://www.*.jar" -d apacheignite/ignite
In the command, we can see that the JVM heap size are 2G, but the Docker
container consumes more than 4G:
[jisen@w2_s_ignite_003 ~]$ sudo docker stats ignite
CONTAINER CPU % MEM USAGE / LIMIT MEM %
NET I/O BLOCK I/O PIDS
ignite 0.35% 4.119GiB / 7.45GiB 55.29%
0B / 0B 979kB / 250GB 114
So, I want to know why the Ignite container consumes so many memory
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: The system cache size was slowly increased
Posted by Justin Ji <bi...@foxmail.com>.
Prem -
Thank for your reply.
You explained why Ignite container grew from 2.5G to more than 4G because
there is 2G heap memory and more than 2G off-heap memory, I think what you
said is very correct and solve one of my main problems.
But another question, why system memory cache also grows slow and never be
reclaimed. As I know Linux kernel will reclaim system memory cache
automatically when lack of memory.
The memory cache I mentioned before refer to the 'buff/cache' that printed
by 'free' command
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: The system cache size was slowly increased
Posted by Justin Ji <bi...@foxmail.com>.
Evgenii -
Thank for your reply!
"system memory cache" means the result printed by 'free -m' command.
After some test, I solved the problem, it caused by my Linux setting, not
Ignite.
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: The system cache size was slowly increased
Posted by ezhuravlev <e....@gmail.com>.
Hi,
What do you mean by " system memory cache also grows "? How do you see this?
Evgenii
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: The system cache size was slowly increased
Posted by Justin Ji <bi...@foxmail.com>.
Prem -
Thank for your reply.
You explained why Ignite container grew from 2.5G to more than 4G because
there is 2G heap memory and more than 2G off-heap memory, I think what you
said is very correct and solve one of my main problems.
But another question, why system memory cache also grows slow and never be
reclaimed. As I know Linux kernel will reclaim system memory cache
automatically when lack of memory.
The memory cache I mentioned before refer to the 'buff/cache' that printed
by 'free' command
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: The system cache size was slowly increased
Posted by Prem Prakash Sharma <pr...@infoworks.io>.
Hi Justin,
If I am not wrong you have a data region of 3GiB and your ignite node heap size is 2GiB. Since data region was started with 512MB I am guessing your container slowly grew from 2.5Gib to where it is now. The problem in your case is both the off-heap and JVM heaps are different and ignite will grow to fill the off heap space when you put more data in off-heap memory.
Regards,
Prem
> On 07-Sep-2018, at 8:25 AM, Justin Ji <bi...@foxmail.com> wrote:
>
> Who can give me some advice?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
Re: The system cache size was slowly increased
Posted by Justin Ji <bi...@foxmail.com>.
Who can give me some advice?
--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/