You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Pavel Vinokurov <vi...@gmail.com> on 2018/05/10 10:37:40 UTC

Re: Read request response time is unstable, often more than500milliseconds, but the cluster load is small

Ignite node should start with any wal mode. I suppose that the same error
should be occurred with FSYNC mode.
Would you be able to restart with LOG_ONLY mode and show the logs.

2018-05-10 12:39 GMT+03:00 NO <72...@qq.com>:

> Using the LOG_ONLY mode, I remember having encountered this problem. After
> the node rebooted and printed an error message, the node could not be
> started. At that time, I did not reserve the error message. I searched for
> the source code, which may be one of the two.
> 1. 'Failed to find checkpoint record at the given WAL pointer'
> 2. 'on disk, but checkpoint record is missed in WAL '
>
> In the LOG_ONLY mode, it may not start in case of node crash?
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Pavel Vinokurov"<vi...@gmail.com>;
> *发送时间:* 2018年5月10日(星期四) 下午5:13
> *收件人:* "user"<us...@ignite.apache.org>;
> *主题:* Re: Read request response time is unstable, often more
> than500milliseconds, but the cluster load is small
>
> Please, try to check performance with LOG_ONLY mode.
>
> 2018-05-10 12:03 GMT+03:00 NO <72...@qq.com>:
>
>> Hi,
>>
>> I have tested -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true set this
>> parameter, but it will seriously affect the write speed, I do not know what
>> the impact of setting this parameter is, whether it is necessary to set
>> other parameters to increase the write speed?
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Pavel Vinokurov"<vi...@gmail.com>;
>> *发送时间:* 2018年5月10日(星期四) 下午4:59
>> *收件人:* "user"<us...@ignite.apache.org>;
>> *主题:* Re: Read request response time is unstable, often more than
>> 500milliseconds, but the cluster load is small
>>
>> Hi,
>>
>> I see several exceptions in your logs. Probably it causes the slowdown.
>> >> java.lang.ClassCastException: org.apache.ignite.internal.pro
>> cessors.cache.persistence.wal.FsyncModeFileWriteAheadLogManager cannot
>> be cast to org.apache.ignite.internal.processors.cache.persistence.wal.
>> FileWriteAheadLogManager
>>
>> Seems to you have the issue related to https://issues.apache.org/j
>> ira/browse/IGNITE-7865 that fixed in the 2.5 version.
>> As workaround you could change WALMode to LOG_ONLY or start ignite with
>> the jvm property -DIGNITE_WAL_FSYNC_WITH_DEDICATED_WORKER=true
>>
>> Thanks,
>> Pavel
>>
>>
>>
>>
>>
>> 2018-05-10 5:42 GMT+03:00 NO <72...@qq.com>:
>>
>>> hi,
>>>
>>> Ignite version : 2.4.0
>>>
>>> Read operations often exceed 500 milliseconds, but the cluster traffic
>>> is very small. I don't know why. Please help me solve this problem. Thank
>>> you very much. Here is some configuration information.
>>>
>>> 8 node : (48 core ,192G RAM, 4TB SSD)
>>> Cluster records : 1.7 billion primary keys , 1.7 billion backup keys
>>> Get requests per second : 100+
>>> Put requests per second : 400+
>>> Each node occupies more than 500GB of disk space.
>>>
>>> 2 node :
>>> LSB Version:    :core-4.1-amd64:core-4.1-noarc
>>> h:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1
>>> -noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.
>>> 1-amd64:printing-4.1-noarch
>>> Distributor ID:    CentOS
>>> Description:    CentOS Linux release 7.2.1511 (Core)
>>> Release:    7.2.1511
>>> Codename:    Core
>>>
>>> 6 node:
>>> LSB Version:    :base-4.0-amd64:base-4.0-noarc
>>> h:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics
>>> -4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>>> Distributor ID:    CentOS
>>> Description:    CentOS release 6.7 (Final)
>>> Release:    6.7
>>> Codename:    Final
>>> ============================================================
>>> =============
>>> The node configuration is as follows
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <beans xmlns="http://www.springframework.org/schema/beans"
>>>        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>        xmlns:util="http://www.springframework.org/schema/util"
>>>        xsi:schemaLocation="http://www.springframework.org/schema/beans
>>> http://www.springframework.org/schema/beans/spring-beans.xsd
>>>         http://www.springframework.org/schema/util
>>> http://www.springframework.org/schema/util/spring-util.xsd
>>>         ">
>>>     <bean id="ignite.cfg" class="org.apache.ignite.confi
>>> guration.IgniteConfiguration">
>>>            <property name="failureDetectionTimeout" value="60000"/>
>>>
>>>         <property name="clientFailureDetectionTimeout"
>>> value="60000"/>
>>>         <property name="segmentationPolicy" value="RESTART_JVM"/>
>>>         <property name="publicThreadPoolSize" value="64"/>
>>>         <property name="systemThreadPoolSize" value="64"/>
>>>         <property name="dataStreamerThreadPoolSize" value="64"/>
>>>         <property name="rebalanceThreadPoolSize" value="4" />
>>>         <property name="dataStorageConfiguration">
>>>             <bean class="org.apache.ignite.confi
>>> guration.DataStorageConfiguration">
>>>                 <property name="defaultDataRegionConfiguration">
>>>                     <bean class="org.apache.ignite.confi
>>> guration.DataRegionConfiguration">
>>>                         <property name="name"
>>> value="qipu_entity_cache_data_region"/>
>>>                         <property name="initialSize" value="#{10L * 1024
>>> * 1024 * 1024}"/>
>>>                         <property name="maxSize" value="#{100L * 1024 *
>>> 1024 * 1024}"/>
>>>                         <property name="persistenceEnabled"
>>> value="true"/>
>>>                         <property name="metricsEnabled" value="true"/>
>>>                         <property name="checkpointPageBufferSize"
>>> value="#{1 * 1024 * 1024 * 1024}"/>
>>>                     </bean>
>>>                 </property>
>>>                 <property name="walSegmentSize" value="#{64 * 1024 *
>>> 1024}"/>
>>>                 <property name="pageSize" value="#{4 * 1024}"/>
>>>                 <property name="walSegments" value="#{20}"/>
>>>                 <property name="walMode" value="FSYNC"/>
>>>                 <property name="metricsEnabled" value="true"/>
>>>                 <property name="writeThrottlingEnabled"
>>> value="true"/>
>>>                 <property name="checkpointThreads" value="8"/>
>>>
>>>                 <property name="walThreadLocalBufferSize" value="#{1 *
>>> 1024 * 1024}"/>
>>>             </bean>
>>>         </property>
>>>
>>>         <property name="cacheConfiguration">
>>>             <bean class="org.apache.ignite.confi
>>> guration.CacheConfiguration">
>>>                 <property name="dataRegionName"
>>> value="qipu_entity_cache_data_region"/>
>>>                 <property name="name" value="qipu_entity_cache"/>
>>>                 <property name="cacheMode" value="PARTITIONED"/>
>>>                 <property name="partitionLossPolicy" value="IGNORE"/>
>>>                 <property name="atomicityMode" value="ATOMIC"/>
>>>                 <property name="backups" value="1"/>
>>>                 <property name="writeSynchronizationMode"
>>> value="FULL_SYNC"/>
>>>                 <property name="statisticsEnabled" value="true"/>
>>>                 <property name="rebalanceBatchSize" value="#{20 * 1024 *
>>> 1024}"/>
>>>                 <property name="rebalanceThrottle" value="0"/>
>>>
>>>                 <property name="rebalanceMode" value="ASYNC"/>
>>>
>>>                 <property name="rebalanceBatchesPrefetchCount"
>>> value="4"/>
>>>                 <property name="rebalanceTimeout" value="20000"/>
>>>
>>>                 <property name="maxConcurrentAsyncOperations"
>>> value="#{4 * 500}"/>
>>>             </bean>
>>>         </property>
>>>
>>>         <property name="communicationSpi">
>>>             <bean class="org.apache.ignite.spi.c
>>> ommunication.tcp.TcpCommunicationSpi">
>>>                 <property name="messageQueueLimit" value="20480"/>
>>>             </bean>
>>>         </property>
>>>         <property name="discoverySpi">
>>>             <bean class="org.apache.ignite.spi.d
>>> iscovery.tcp.TcpDiscoverySpi">
>>>                 <property name="forceServerMode" value="true"/>
>>>                 <property name="ipFinder">
>>>                     <bean class="org.apache.ignite.spi.d
>>> iscovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
>>>                         <property name="addresses">
>>>                             <list>
>>>                                 <!-- In distributed environment,
>>> replace with actual host IP address. -->
>>>                                 <value>10.13.13.39:47500..47509</value>
>>>                                 <value>10.13.13.49:47500..47509</value>
>>>                                 <value>10.13.13.50:47500..47509</value>
>>>                                 <value>10.13.13.51:47500..47509</value>
>>>                                 <value>10.13.13.59:47500..47509</value>
>>>                                 <value>10.13.13.60:47500..47509</value>
>>>                                 <value>10.13.13.61:47500..47509</value>
>>>                                 <value>10.13.13.63:47500..47509</value>
>>>                             </list>
>>>                         </property>
>>>                     </bean>
>>>                 </property>
>>>             </bean>
>>>         </property>
>>>         <property name="gridLogger">
>>>             <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>>                 <constructor-arg type="java.lang.String"
>>> value="/home/qipu/production/apache-ignite-2.4.0/config/igni
>>> te-log4j2.xml"/>
>>>             </bean>
>>>         </property>
>>>     </bean>
>>> </beans>
>>> ============================================================
>>> =====================================
>>> #ignite.sh
>>> JVM config
>>> JVM_OPTS="-Xms24g -Xmx24g -server -XX:+AggressiveOpts
>>> -XX:MaxMetaspaceSize=512m"
>>> JVM_OPTS="${JVM_OPTS} -XX:+AlwaysPreTouch"
>>> JVM_OPTS="${JVM_OPTS} -XX:+UseG1GC"
>>> JVM_OPTS="${JVM_OPTS} -XX:+ScavengeBeforeFullGC"
>>> JVM_OPTS="${JVM_OPTS} -XX:+DisableExplicitGC"
>>> JVM_OPTS="${JVM_OPTS} -XX:+HeapDumpOnOutOfMemoryError "
>>> JVM_OPTS="${JVM_OPTS} -XX:HeapDumpPath=${IGNITE_HOME}/work"
>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDetails"
>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCTimeStamps"
>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintGCDateStamps"
>>> JVM_OPTS="${JVM_OPTS} -XX:+UseGCLogFileRotation"
>>> JVM_OPTS="${JVM_OPTS} -XX:NumberOfGCLogFiles=10"
>>> JVM_OPTS="${JVM_OPTS} -XX:GCLogFileSize=100M"
>>> JVM_OPTS="${JVM_OPTS} -Xloggc:${IGNITE_HOME}/work/gc.log"
>>> JVM_OPTS="${JVM_OPTS} -XX:+PrintAdaptiveSizePolicy"
>>> JVM_OPTS="${JVM_OPTS} -XX:MaxGCPauseMillis=100"
>>> ============================================================
>>> =========================================
>>> node config
>>> #/etc/sysctl.conf
>>> fs.file-max = 512000
>>> net.core.rmem_max = 67108864
>>> net.core.wmem_max = 67108864
>>> net.core.rmem_default = 65536
>>> net.core.wmem_default = 65536
>>> net.core.netdev_max_backlog = 4096
>>> net.core.somaxconn = 4096
>>> net.ipv4.tcp_syncookies = 1
>>> net.ipv4.tcp_tw_reuse = 1
>>> net.ipv4.tcp_tw_recycle = 0
>>> net.ipv4.tcp_fin_timeout = 30
>>> net.ipv4.tcp_keepalive_time = 1200
>>> net.ipv4.ip_local_port_range = 10000 65000
>>> net.ipv4.tcp_max_syn_backlog = 4096
>>> net.ipv4.tcp_max_tw_buckets = 5000
>>> net.ipv4.tcp_rmem = 4096 87380 67108864
>>> net.ipv4.tcp_wmem = 4096 65536 67108864
>>> net.ipv4.tcp_mtu_probing = 1
>>> vm.swappiness=0
>>> vm.zone_reclaim_mode = 0
>>> vm.dirty_writeback_centisecs = 500
>>> vm.dirty_expire_centisecs = 500
>>> ===============================================
>>> #/etc/security/limits.conf
>>> *       soft    nofile          65535
>>> *       hard    nofile          65535
>>>
>>>
>>> # End of file
>>> *               soft    nofile             65535
>>> *               hard    nofile             65535
>>> *       soft    nofile          81920
>>> *       hard    nofile          81920
>>> *       soft    nproc           81920
>>> *       hard    nproc           81920
>>> *       soft    core            10240
>>> *       hard    core            10240
>>> *    soft    data       unlimited
>>> *    hard    data       unlimited
>>> *    soft    stack      unlimited
>>> *    hard    stack      unlimited
>>> *    soft    memory     unlimited
>>> *    hard    memory     unlimited
>>> *    soft    cpu        unlimited
>>> *    hard    cpu        unlimited
>>> *    soft    memlock    unlimited
>>> *    hard    memlock    unlimited
>>>
>>> * hard memlock      unlimited
>>> * soft memlock      unlimited
>>> ===============================================
>>>
>>> client code
>>> ==============================================
>>> Ignition.setClientMode(true);
>>>
>>>         IgniteConfiguration cfg = new IgniteConfiguration();
>>>         TcpDiscoverySpi spi = new TcpDiscoverySpi();
>>>
>>>         TcpDiscoveryVmIpFinder finder = new TcpDiscoveryVmIpFinder();
>>>         finder.setAddresses(Arrays.asList(env.getProperty("ignite.se
>>> rver").split(",")));
>>>         spi.setIpFinder(finder);
>>>
>>>         cfg.setDiscoverySpi(spi);
>>>         cfg.setGridLogger(new Slf4jLogger());
>>>         Ignite ignite = Ignition.start(cfg);
>>>         IgniteCache<String, byte[]> igniteCache = ignite
>>> .getOrCreateCache("qipu_entity_cache");
>>>
>>>         // get code 【Read operation response time often exceeds 1s】
>>>         igniteCache.getAllAsync(keySet).get(1000);
>>>
>>>         // put code
>>>         // cache.putAllAsync(map).get(3000);
>>> ==============================================
>>>
>>>
>>> Attachment is a node's gc log and node log
>>>
>>> Please give some suggestions on how to reduce the read operation
>>> response time. Thank you.
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> Regards
>>
>> Pavel Vinokurov
>>
>
>
>
> --
>
> Regards
>
> Pavel Vinokurov
>



-- 

Regards

Pavel Vinokurov