You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Justin Ji <bi...@foxmail.com> on 2018/09/07 02:13:28 UTC

The system cache size was slowly increased

Hi all -

We use Ignite in our production environment, But I found that the system
cache was increased slowly and never reclaim. when the free system memory
less than 200M, the node seemed did not work anymore and our system cannot
get any response from the server nodes. The image below is our server's
monitoring data:
<http://apache-ignite-users.70518.x6.nabble.com/file/t2000/WechatIMG86.jpeg> 

Our server nodes configuration is:

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:util="http://www.springframework.org/schema/util"
       xsi:schemaLocation="
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/util
        http://www.springframework.org/schema/util/spring-util.xsd">
    
    <bean class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="discoverySpi">
            <bean
class="org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi">
                <property name="zkConnectionString"
value="172.31.34.133:2181,172.31.32.111:2181,172.31.37.6:2181"/>
                <property name="sessionTimeout" value="30000"/>
                <property name="zkRootPath" value="/ignite/discovery"/>
                <property name="joinTimeout" value="10000"/>
            </bean>
        </property>

        <property name="gridLogger">
            <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
            <constructor-arg type="java.lang.String" value="log4j2.xml"/>
            </bean>
        </property>

        <property name="communicationSpi">
            <bean
class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
                <property name="localPort" value="47174"/>
                <property name="messageQueueLimit" value="1024"/>
            </bean>
        </property>

        <property name="dataStorageConfiguration">
            <bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="defaultDataRegionConfiguration">
                    <bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <property name="persistenceEnabled" value="true"/>
                        <property name="initialSize" value="#{512L * 1024 *
1024}"/>
                        <property name="maxSize" value="#{3L * 1024 * 1024 *
1024}"/>
                    </bean>
                </property>
                <property name="writeThrottlingEnabled" value="true"/>
                <property name="storagePath"
value="/var/lib/ignite/persistence"/>
                <property name="walPath" value="/wal"/>
                <property name="walArchivePath" value="/wal/archive"/>
            </bean>
        </property>
        <property name="includeEventTypes">
            <list>
                <util:constant
static-field="org.apache.ignite.events.EventType.EVT_TASK_STARTED"/>
                <util:constant
static-field="org.apache.ignite.events.EventType.EVT_TASK_FINISHED"/>
                <util:constant
static-field="org.apache.ignite.events.EventType.EVT_TASK_FAILED"/>
            </list>
        </property>
    </bean>
</beans>

and the client nodes cache configuration is:
TcpCommunicationSpi communicationSpi =
DefaultIgniteConfiguration.getTcpCommunicationSpi(ignitePort);
cfg.setCommunicationSpi(communicationSpi);

//设备缓存配置
//BinaryObject 即com.tuya.athena.ignite.domain.DeviceStatusIgniteVO
CacheConfiguration<String, BinaryObject> cacheCfg = new
CacheConfiguration<>();
cacheCfg.setName("device_status");
//分区存储
cacheCfg.setCacheMode(CacheMode.PARTITIONED);
//backup count
cacheCfg.setBackups(1);
cacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
cacheCfg.setCacheStoreFactory(FactoryBuilder.factoryOf(DeviceStatusCacheStore.class));
cacheCfg.setWriteThrough(true);
cacheCfg.setWriteBehindEnabled(true);
//flush every minutes
cacheCfg.setWriteBehindFlushFrequency(60 * 1000);
cacheCfg.setWriteBehindBatchSize(1024);
cacheCfg.setStoreKeepBinary(true);

cfg.setCacheConfiguration(cacheCfg);

ignite = Ignition.getOrStart(cfg);
ignite.cluster().active(true);

Is there any inappropriate place in my configuration? Looking forward to
your reply.

PS:



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: The system cache size was slowly increased

Posted by Justin Ji <bi...@foxmail.com>.
I analyzed the log and found that there are many *checkpoint timeout*. The
following are the segments of the checkpoint:

[db-checkpoint-thread-#46] INFO 
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Checkpoint started
[checkpointId=7fdb8ddd-76d7-4ddf-9fd1-cddd8bb093be, startPtr=FileWALPointer
[idx=593, fileOff=30559855, len=15120], checkpointLockWait=0ms,
checkpointLockHoldTime=13ms, walCpRecordFsyncDuration=15ms, pages=37037,
reason='timeout']
[db-checkpoint-thread-#46] INFO 
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Checkpoint finished
[cpId=7fdb8ddd-76d7-4ddf-9fd1-cddd8bb093be, pages=37037,
markPos=FileWALPointer [idx=593, fileOff=30559855, len=15120],
walSegmentsCleared=0, markDuration=36ms, pagesWrite=318ms, fsync=4370ms,
total=4724ms]
[db-checkpoint-thread-#46] INFO 
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Checkpoint started
[checkpointId=1078e60f-eb68-4cac-b7a3-15e6ca83dbb6, startPtr=FileWALPointer
[idx=593, fileOff=39744249, len=15120], checkpointLockWait=0ms,
checkpointLockHoldTime=6ms, walCpRecordFsyncDuration=7ms, pages=2145,
reason='timeout']
2018-09-06 08:33:23:909 [db-checkpoint-thread-#46] INFO 
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Checkpoint finished
[cpId=1078e60f-eb68-4cac-b7a3-15e6ca83dbb6, pages=2145,
markPos=FileWALPointer [idx=593, fileOff=39744249, len=15120],
walSegmentsCleared=1, markDuration=15ms, pagesWrite=20ms, fsync=372ms,
total=407ms]
[db-checkpoint-thread-#46] INFO 
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Skipping checkpoint (no
pages were modified) [checkpointLockWait=0ms, checkpointLockHoldTime=1ms,
reason='timeout']
[db-checkpoint-thread-#46] INFO 
o.a.i.i.p.c.p.GridCacheDatabaseSharedManager:478 - Skipping checkpoint (no
pages were modified) [checkpointLockWait=0ms, checkpointLockHoldTime=1ms,
reason='timeout']

*So, can I assume that the culprit in cache rise is the checkpoint?*
If true, how can I deal with it?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: The system cache size was slowly increased

Posted by Justin Ji <bi...@foxmail.com>.
Who can give me some advice?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: The system cache size was slowly increased

Posted by Justin Ji <bi...@foxmail.com>.
The second question:
I run the ignite nodes in the docker container with the following command:
sudo -u docker docker run -v /mnt/logs/apps/ignite:/mnt/logs/apps/ignite -v
/opt/ignite/ext-libs:/opt/ignite/ext-libs -v
/opt/ignite/config:/opt/ignite/config -v
/var/lib/ignite/persistence:/var/lib/ignite/persistence --name ignite
--net=host -e
"CONFIG_URI=file:///opt/ignite/config/ignite-config-prod-us.xml" -e
"OPTION_LIBS=ignite-zookeeper,ignite-indexing,ignite-log4j2,ignite-rest-http"
-e "JVM_OPTS=-Xms2g -Xmx2g -XX:+AlwaysPreTouch -XX:+UseG1GC
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/heapdump/ignite
-XX:+ExitOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=100M -Xloggc:/mnt/logs/apps/ignite/gc.log" -e
"EXTERNAL_LIBS=http://www.*.jar" -d apacheignite/ignite

In the command, we can see that the JVM heap size are 2G, but the Docker
container consumes more than 4G:
[jisen@w2_s_ignite_003 ~]$ sudo docker stats ignite
CONTAINER           CPU %               MEM USAGE / LIMIT    MEM %              
NET I/O             BLOCK I/O           PIDS
ignite              0.35%               4.119GiB / 7.45GiB   55.29%             
0B / 0B             979kB / 250GB       114

So, I want to know why the Ignite container consumes so many memory



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: The system cache size was slowly increased

Posted by Justin Ji <bi...@foxmail.com>.
Prem - 

Thank for your reply.
You explained why Ignite container grew from 2.5G to more than 4G because
there is 2G heap memory and more than 2G off-heap memory, I think what you
said is very correct and solve one of my main problems.

But another question, why system memory cache also grows slow and never be
reclaimed. As I know Linux kernel will reclaim system memory cache
automatically when lack of memory.

The memory cache I mentioned before refer to the 'buff/cache' that printed
by 'free' command



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: The system cache size was slowly increased

Posted by Justin Ji <bi...@foxmail.com>.
Evgenii - 

Thank for your reply!

"system memory cache" means the result printed by 'free -m' command.

After some test, I solved the problem, it caused by my Linux setting, not
Ignite.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: The system cache size was slowly increased

Posted by ezhuravlev <e....@gmail.com>.
Hi,

What do you mean by " system memory cache also grows "? How do you see this?

Evgenii



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: The system cache size was slowly increased

Posted by Justin Ji <bi...@foxmail.com>.
Prem - 

Thank for your reply. 
You explained why Ignite container grew from 2.5G to more than 4G because 
there is 2G heap memory and more than 2G off-heap memory, I think what you 
said is very correct and solve one of my main problems. 

But another question, why system memory cache also grows slow and never be 
reclaimed. As I know Linux kernel will reclaim system memory cache 
automatically when lack of memory. 

The memory cache I mentioned before refer to the 'buff/cache' that printed 
by 'free' command 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: The system cache size was slowly increased

Posted by Prem Prakash Sharma <pr...@infoworks.io>.
Hi Justin,

If I am not wrong you have a data region of 3GiB and your ignite node heap size is 2GiB. Since data region was started with 512MB I am guessing your container slowly grew from 2.5Gib to where it is now. The problem in your case is both the off-heap and JVM heaps are different and ignite will grow to fill the off heap space when you put more data in off-heap memory.

Regards,
Prem

> On 07-Sep-2018, at 8:25 AM, Justin Ji <bi...@foxmail.com> wrote:
> 
> Who can give me some advice?
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: The system cache size was slowly increased

Posted by Justin Ji <bi...@foxmail.com>.
Who can give me some advice?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/