You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by 38797715 <38...@qq.com> on 2020/05/12 13:08:48 UTC
Binary recovery for a very long time
Hi community,
We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
We have 7 tables to test, and the data volume is
respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
are partitioned(backup = 1)
VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
-Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
-XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
Today, one of the servers was restarted(kill and then start ignite.sh)
for some reason, but the node took 1.5 hours to start, which was much
longer than expected.
After analyzing the log, the following information is found:
[2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
[2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
Binary memory state restored at node startup [restoredPtr=FileWALPointer
[idx=10511, fileOff=51410110, len=0]]
[2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager] Resuming
logging to WAL segment
[file=/appdata/ignite/db/wal/24/0000000000000001.wal, offset=51410110,
ver=2]
[2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl] Started page
memory [memoryAllocated=200.0GiB, pages=50821088, tableSize=3.9GiB,
checkpointBuffer=2.0GiB]
[2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor] Started cache
in recovery mode [name=CO_CO_NEW, id=-189779360, dataRegionName=default,
mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
[2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor] Started cache
in recovery mode [name=CO_CO_LINE, id=-1588248812,
dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
mvcc=false]
[2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor] Started cache
in recovery mode [name=ignite-sys-cache, id=-2100569601,
dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL,
backups=2147483647, mvcc=false]
[2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor] Started cache
in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
mvcc=false]
[2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor] Started cache
in recovery mode [name=PI_COM_DAY, id=-1904194728,
dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
mvcc=false]
[2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor] Started cache
in recovery mode [name=PLM_ITEM, id=-1283854143, dataRegionName=default,
mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
[2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor] Started cache
in recovery mode [name=CO_CO, id=64322847, dataRegionName=default,
mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
[2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor] Started cache
in recovery mode [name=CO_CUST, id=1684722246, dataRegionName=default,
mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
[2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
Binary recovery performed in 4970233ms.
Among them, binary recovery took 4970 seconds.
Our question is:
1.Why is the start time so long?
2.Is the current state of ignite, with the growth of single node data
volume, the restart time will be longer and longer?
3.Do have any suggestions for optimizing the restart time?
Re: Binary recovery for a very long time
Posted by 38797715 <38...@qq.com>.
Hi,
I enabled debug logging and found the following log output:
[2020-05-23T21:43:58,397][DEBUG][nio-acceptor-tcp-comm-#28%ClusterName1%][TcpCommunicationSpi]
Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]
[2020-05-23T21:43:58,405][DEBUG][main][SchemaManager] Creating DB table
with SQL: CREATE TABLE "PUBLIC"."CO_CUST"(_KEY VARCHAR INVISIBLE NOT
NULL,_VAL OTHER INVISIBLE,"CUST_ID"VARCHAR(30) NOT
NULL,"CUST_NAME"VARCHAR(200),"CUST_SHORT_NAME"VARCHAR(200),"CUST_SHORT_ID"VARCHAR(240),"LICENSE_CODE"VARCHAR(30),"STATUS"VARCHAR(16),"COM_ID"VARCHAR(30),"SALE_CENTER_ID"VARCHAR(30),"SALE_DEPT_ID"VARCHAR(30),"SLSMGR_ID"VARCHAR(30),"SLSMAN_ID"VARCHAR(30),"SLSMAN_MOBILE"VARCHAR(16),"MANAGER"VARCHAR(100),"IDENTITY_CARD_ID"VARCHAR(36),"ORDER_TEL"VARCHAR(80),"INV_TYPE"VARCHAR(18),"ORDER_WAY"VARCHAR(18),"PAY_TYPE"VARCHAR(2),"PERIODS"VARCHAR(200),"PRD_ST_DATE"VARCHAR(8),"ORDER_CUST_ID"VARCHAR(30),"BUSI_ADDR"VARCHAR(400),"WORK_PORT"VARCHAR(18),"BASE_TYPE"VARCHAR(24),"SALE_SCOPE"VARCHAR(1),"SCOPE"VARCHAR(200),"COM_CHARA"VARCHAR(2),"INNER_TYPE"VARCHAR(18),"CUST_KIND"VARCHAR(10),"CUST_KIND_NAME"VARCHAR(120),"CUST_TYPE"VARCHAR(6),"CUST_TYPE1"VARCHAR(6),"CUST_TYPE2"VARCHAR(6),"CUST_TYPE3"VARCHAR(6),"CUST_TYPE4"VARCHAR(6),"CUST_TYPE5"VARCHAR(6),"AREA_TYPE"VARCHAR(2),"IS_SEFL_CUST"VARCHAR(1),"IS_FUNC_CUST"VARCHAR(1),"MANAGER_BIRTHDAY"VARCHAR(8),"CELEBRATE_DATE"VARCHAR(8),"NATION_CUST_CODE"VARCHAR(30),"LONGITUDE"DECIMAL,"LATITUDE"DECIMAL,"AGENT_MESSAGE"VARCHAR(400),"NOTE"VARCHAR(1000),"UPDATE_TIME"TIMESTAMP,"INVTY_ID"VARCHAR(30),"STEP_ID"VARCHAR(20),"INV_CUST_ID"VARCHAR(60),"INV_UNIT_NAME"VARCHAR(200),"ACCOUNT"VARCHAR(200),"BANK"VARCHAR(100),"TAX_ACCOUNT"VARCHAR(120),"OTHER_ORDER_WAY"VARCHAR(32),"SALE_AVG"DECIMAL,"ITEM_ORD"DECIMAL,"QTY_SOLD"DECIMAL,"AMT_SOLD"DECIMAL,"MANAGER_TEL"VARCHAR(80),"IS_SALE_LARGE"VARCHAR(1),"TAX_TEL"VARCHAR(60),"TAX_ADDR"VARCHAR(400),"IS_TOR_TAX"VARCHAR(1),"CUST_TYPE6"VARCHAR(6),"CUST_TYPE7"VARCHAR(6),"CUST_TYPE8"VARCHAR(124),"CUST_TYPE9"VARCHAR(60),"CUST_TYPE10"VARCHAR(200),"CANT_ID"VARCHAR(20),"IS_ONLINE_PAY"VARCHAR(1),"IS_RAIL_CUST"VARCHAR(1),"CUST_SEG"VARCHAR(30),"QTY_MULTIPLE"VARCHAR(20),"TAX_ADRR"VARCHAR(200),"COLLECT_STAFF_ID"VARCHAR(30),"ITEM_HEIGHT"DECIMAL,"AREA_ID"VARCHAR(30),"BASE_TYPE_EXT"VARCHAR(30),"AREA_TYPE_EXT"VARCHAR(30),"WORK_PORT_EXT"VARCHAR(30),"CUST_SEG_EXT"VARCHAR(30),"IS_CIGAR_CUST"VARCHAR(1))
[2020-05-23T21:43:58,411][DEBUG][main][IgniteH2Indexing] Creating cache
index [cacheId=1684722246, idxName=_key_PK]
[2020-05-23T21:43:59,081][DEBUG][main][IgniteH2Indexing] Creating cache
index [cacheId=1684722246, idxName=IDX_CO_CUST_SALE_CENTER_ID]
[2020-05-23T21:43:59,088][DEBUG][main][IgniteH2Indexing] Creating cache
index [cacheId=1684722246, idxName=IDX_CO_CUST_STATUS]
[2020-05-23T21:43:59,099][DEBUG][main][IgniteH2Indexing] Creating cache
index [cacheId=1684722246, idxName=IDX_CO_CUST_SALE_DEPT_ID]
[2020-05-23T21:43:59,109][INFO][main][GridCacheProcessor] Started cache
in recovery mode [name=CO_CUST, id=1684722246, dataRegionName=default,
mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
It can be seen from the log that all tables and indexes need to be
rebuilt during node startup? Just don't load data?
在 2020/5/21 下午8:48, Ilya Kasnacheev 写道:
> Hello!
>
> 1. I guess that WAL is read.
> 2. Unfortunately we do not have truly graceful exit as far as my
> understanding goes.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 19 мая 2020 г. в 10:22, 38797715 <38797715@qq.com
> <ma...@qq.com>>:
>
> Hi,
>
> the following log message:
>
> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started
> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
> backups=1, mvcc=false]
>
> I have the following questions:
>
> 1.What has been done in the startup cache in recovery mode?
>
> 2.After testing, if the node stops normally (non abnormal
> shutdown), the recovery process will also be performed during
> startup. Why?
>
> 在 2020/5/18 下午9:58, Ilya Kasnacheev 写道:
>> Hello!
>>
>> Direct IO module is experimental and should not be used unless
>> performance is tested first, in your specific use case.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 18 мая 2020 г. в 16:47, 38797715 <38797715@qq.com
>> <ma...@qq.com>>:
>>
>> Hi,
>>
>> If direct IO is disabled, the startup speed will be doubled,
>> including some other tests. I find that direct IO has a great
>> impact on the read performance.
>>
>> 在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>>> Can you share full logs from all nodes?
>>>
>>> вт, 12 мая 2020 г. в 18:24, 38797715 <38797715@qq.com
>>> <ma...@qq.com>>:
>>>
>>> Hi Evgenii,
>>>
>>> The storage used is not SSD.
>>>
>>> We will use different versions of ignite for further
>>> testing, such as ignite2.8.
>>> Ignite is configured as follows:
>>>
>>> <?xmlversion="1.0"encoding="UTF-8"?>
>>> <beansxmlns="http://www.springframework.org/schema/beans"
>>> <http://www.springframework.org/schema/beans>
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>> <http://www.w3.org/2001/XMLSchema-instance>xsi:schemaLocation="http://www.springframework.org/schema/beans
>>> http://www.springframework.org/schema/beans/spring-beans.xsd">
>>> <beanid="ignite.cfg"class="org.apache.ignite.configuration.IgniteConfiguration">
>>> <propertyname="peerClassLoadingEnabled"value="true"/>
>>> <propertyname="consistentId"value="20"/>
>>> <propertyname="failureDetectionTimeout"value="120000"/>
>>> <propertyname="workDirectory"value="/appdata/ignite"/>
>>> <propertyname="rebalanceBatchSize"value="#{2 * 1024 *
>>> 1024}"/>
>>> <propertyname="rebalanceThrottle"value="100"/>
>>> <propertyname="rebalanceThreadPoolSize"value="4"/>
>>> <propertyname="gridLogger">
>>> <beanclass="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>> <constructor-argtype="java.lang.String"value="config/ignite-log4j2.xml"/>
>>> </bean>
>>> </property>
>>> <propertyname="cacheConfiguration">
>>> <list>
>>> <beanid="partitioned-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
>>> <propertyname="name"value="cache-partitioned*"/>
>>> <propertyname="cacheMode"value="PARTITIONED"/>
>>> <propertyname="backups"value="1"/>
>>> <propertyname="queryParallelism"value="16"/>
>>> <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
>>> </bean>
>>> <beanid="replicated-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
>>> <propertyname="name"value="cache-replicated*"/>
>>> <propertyname="cacheMode"value="REPLICATED"/>
>>> <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
>>> </bean>
>>> </list>
>>> </property>
>>> <!-- Enabling Apache Ignite Persistent Store. -->
>>> <propertyname="dataStorageConfiguration">
>>> <beanclass="org.apache.ignite.configuration.DataStorageConfiguration">
>>> <propertyname="defaultDataRegionConfiguration">
>>> <beanclass="org.apache.ignite.configuration.DataRegionConfiguration">
>>> <propertyname="persistenceEnabled"value="true"/>
>>> <propertyname="maxSize"value="#{200L * 1024 * 1024 *
>>> 1024}"/>
>>> </bean>
>>> </property>
>>> </bean>
>>> </property>
>>> </bean>
>>> </beans>
>>> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>>> Hi,
>>>>
>>>> Can you share full logs and configuration? What disk so
>>>> you use?
>>>>
>>>> Evgenii
>>>>
>>>> вт, 12 мая 2020 г. в 06:49, 38797715 <38797715@qq.com
>>>> <ma...@qq.com>>:
>>>>
>>>> Among them:
>>>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>>
>>>> Ignite sys cache: ~ 27 minutes
>>>>
>>>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>>>
>>>>
>>>> 在 2020/5/12 下午9:08, 38797715 写道:
>>>>>
>>>>> Hi community,
>>>>>
>>>>> We have 5 servers, 16 cores, 256g memory, and 200g
>>>>> off-heap memory.
>>>>> We have 7 tables to test, and the data volume is
>>>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
>>>>> are partitioned(backup = 1)
>>>>>
>>>>> VM args:-server -Xms20g -Xmx20g
>>>>> -XX:+AlwaysPreTouch -XX:+UseG1GC
>>>>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
>>>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>>>> -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>>>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>>>>> -Xloggc:/data/gc/logs/gclog.txt
>>>>> -Djava.net.preferIPv4Stack=true
>>>>> -XX:MaxDirectMemorySize=256M
>>>>> -XX:+PrintAdaptiveSizePolicy
>>>>>
>>>>> Today, one of the servers was restarted(kill and
>>>>> then start ignite.sh) for some reason, but the
>>>>> node took 1.5 hours to start, which was much
>>>>> longer than expected.
>>>>>
>>>>> After analyzing the log, the following information
>>>>> is found:
>>>>>
>>>>> [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
>>>>> Found last checkpoint marker
>>>>> [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>>>> pos=FileWALPointer [idx=10511, fileOff=51348888,
>>>>> len=61193]]
>>>>> [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
>>>>> Binary memory state restored at node startup
>>>>> [restoredPtr=FileWALPointer [idx=10511,
>>>>> fileOff=51410110, len=0]]
>>>>> [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager]
>>>>> Resuming logging to WAL segment
>>>>> [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
>>>>> offset=51410110, ver=2]
>>>>> [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl]
>>>>> Started page memory [memoryAllocated=200.0GiB,
>>>>> pages=50821088, tableSize=3.9GiB,
>>>>> checkpointBuffer=2.0GiB]
>>>>> [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor]
>>>>> Started cache in recovery mode [name=CO_CO_NEW,
>>>>> id=-189779360, dataRegionName=default,
>>>>> mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor]
>>>>> Started cache in recovery mode [name=CO_CO_LINE,
>>>>> id=-1588248812, dataRegionName=default,
>>>>> mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor]
>>>>> Started cache in recovery mode
>>>>> [name=ignite-sys-cache, id=-2100569601,
>>>>> dataRegionName=sysMemPlc, mode=REPLICATED,
>>>>> atomicity=TRANSACTIONAL, backups=2147483647,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor]
>>>>> Started cache in recovery mode
>>>>> [name=CO_CO_LINE_NEW, id=1742991829,
>>>>> dataRegionName=default, mode=PARTITIONED,
>>>>> atomicity=ATOMIC, backups=1, mvcc=false]
>>>>> [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor]
>>>>> Started cache in recovery mode [name=PI_COM_DAY,
>>>>> id=-1904194728, dataRegionName=default,
>>>>> mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor]
>>>>> Started cache in recovery mode [name=PLM_ITEM,
>>>>> id=-1283854143, dataRegionName=default,
>>>>> mode=REPLICATED, atomicity=ATOMIC,
>>>>> backups=2147483647, mvcc=false]
>>>>> [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor]
>>>>> Started cache in recovery mode [name=CO_CO,
>>>>> id=64322847, dataRegionName=default,
>>>>> mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor]
>>>>> Started cache in recovery mode [name=CO_CUST,
>>>>> id=1684722246, dataRegionName=default,
>>>>> mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
>>>>> Binary recovery performed in 4970233ms.
>>>>>
>>>>> Among them, binary recovery took 4970 seconds.
>>>>>
>>>>> Our question is:
>>>>>
>>>>> 1.Why is the start time so long?
>>>>>
>>>>> 2.Is the current state of ignite, with the growth
>>>>> of single node data volume, the restart time will
>>>>> be longer and longer?
>>>>>
>>>>> 3.Do have any suggestions for optimizing the
>>>>> restart time?
>>>>>
Re: Binary recovery for a very long time
Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!
We use H2 engine and we need to create tables within H2. We're not touching
much data in the process. I'm not sure this is a factor of slowdown since
it fits in 2s here.
Regards,
--
Ilya Kasnacheev
ср, 27 мая 2020 г. в 04:20, 38797715 <38...@qq.com>:
> Hi,
>
> I enabled debug logging and found the following log output:
> [2020-05-23T21:43:58,397][DEBUG][nio-acceptor-tcp-comm-#28%ClusterName1%][TcpCommunicationSpi]
> Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]
> [2020-05-23T21:43:58,405][DEBUG][main][SchemaManager] Creating DB table
> with SQL: CREATE TABLE "PUBLIC"."CO_CUST" (_KEY VARCHAR INVISIBLE NOT
> NULL,_VAL OTHER INVISIBLE,"CUST_ID" VARCHAR(30) NOT NULL,"CUST_NAME"
> VARCHAR(200),"CUST_SHORT_NAME" VARCHAR(200),"CUST_SHORT_ID" VARCHAR(240),
> "LICENSE_CODE" VARCHAR(30),"STATUS" VARCHAR(16),"COM_ID" VARCHAR(30),
> "SALE_CENTER_ID" VARCHAR(30),"SALE_DEPT_ID" VARCHAR(30),"SLSMGR_ID"
> VARCHAR(30),"SLSMAN_ID" VARCHAR(30),"SLSMAN_MOBILE" VARCHAR(16),"MANAGER"
> VARCHAR(100),"IDENTITY_CARD_ID" VARCHAR(36),"ORDER_TEL" VARCHAR(80),
> "INV_TYPE" VARCHAR(18),"ORDER_WAY" VARCHAR(18),"PAY_TYPE" VARCHAR(2),
> "PERIODS" VARCHAR(200),"PRD_ST_DATE" VARCHAR(8),"ORDER_CUST_ID" VARCHAR(30
> ),"BUSI_ADDR" VARCHAR(400),"WORK_PORT" VARCHAR(18),"BASE_TYPE" VARCHAR(24
> ),"SALE_SCOPE" VARCHAR(1),"SCOPE" VARCHAR(200),"COM_CHARA" VARCHAR(2),
> "INNER_TYPE" VARCHAR(18),"CUST_KIND" VARCHAR(10),"CUST_KIND_NAME" VARCHAR(
> 120),"CUST_TYPE" VARCHAR(6),"CUST_TYPE1" VARCHAR(6),"CUST_TYPE2" VARCHAR(6
> ),"CUST_TYPE3" VARCHAR(6),"CUST_TYPE4" VARCHAR(6),"CUST_TYPE5" VARCHAR(6),
> "AREA_TYPE" VARCHAR(2),"IS_SEFL_CUST" VARCHAR(1),"IS_FUNC_CUST" VARCHAR(1
> ),"MANAGER_BIRTHDAY" VARCHAR(8),"CELEBRATE_DATE" VARCHAR(8),
> "NATION_CUST_CODE" VARCHAR(30),"LONGITUDE" DECIMAL,"LATITUDE" DECIMAL,
> "AGENT_MESSAGE" VARCHAR(400),"NOTE" VARCHAR(1000),"UPDATE_TIME" TIMESTAMP,
> "INVTY_ID" VARCHAR(30),"STEP_ID" VARCHAR(20),"INV_CUST_ID" VARCHAR(60),
> "INV_UNIT_NAME" VARCHAR(200),"ACCOUNT" VARCHAR(200),"BANK" VARCHAR(100),
> "TAX_ACCOUNT" VARCHAR(120),"OTHER_ORDER_WAY" VARCHAR(32),"SALE_AVG"
> DECIMAL,"ITEM_ORD" DECIMAL,"QTY_SOLD" DECIMAL,"AMT_SOLD" DECIMAL,
> "MANAGER_TEL" VARCHAR(80),"IS_SALE_LARGE" VARCHAR(1),"TAX_TEL" VARCHAR(60
> ),"TAX_ADDR" VARCHAR(400),"IS_TOR_TAX" VARCHAR(1),"CUST_TYPE6" VARCHAR(6),
> "CUST_TYPE7" VARCHAR(6),"CUST_TYPE8" VARCHAR(124),"CUST_TYPE9" VARCHAR(60
> ),"CUST_TYPE10" VARCHAR(200),"CANT_ID" VARCHAR(20),"IS_ONLINE_PAY"
> VARCHAR(1),"IS_RAIL_CUST" VARCHAR(1),"CUST_SEG" VARCHAR(30),"QTY_MULTIPLE"
> VARCHAR(20),"TAX_ADRR" VARCHAR(200),"COLLECT_STAFF_ID" VARCHAR(30),
> "ITEM_HEIGHT" DECIMAL,"AREA_ID" VARCHAR(30),"BASE_TYPE_EXT" VARCHAR(30),
> "AREA_TYPE_EXT" VARCHAR(30),"WORK_PORT_EXT" VARCHAR(30),"CUST_SEG_EXT"
> VARCHAR(30),"IS_CIGAR_CUST" VARCHAR(1))
> [2020-05-23T21:43:58,411][DEBUG][main][IgniteH2Indexing] Creating cache
> index [cacheId=1684722246, idxName=_key_PK]
> [2020-05-23T21:43:59,081][DEBUG][main][IgniteH2Indexing] Creating cache
> index [cacheId=1684722246, idxName=IDX_CO_CUST_SALE_CENTER_ID]
> [2020-05-23T21:43:59,088][DEBUG][main][IgniteH2Indexing] Creating cache
> index [cacheId=1684722246, idxName=IDX_CO_CUST_STATUS]
> [2020-05-23T21:43:59,099][DEBUG][main][IgniteH2Indexing] Creating cache
> index [cacheId=1684722246, idxName=IDX_CO_CUST_SALE_DEPT_ID]
> [2020-05-23T21:43:59,109][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CUST, id=1684722246, dataRegionName=default,
> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>
> It can be seen from the log that all tables and indexes need to be rebuilt
> during node startup? Just don't load data?
>
> 在 2020/5/21 下午8:48, Ilya Kasnacheev 写道:
>
> Hello!
>
> 1. I guess that WAL is read.
> 2. Unfortunately we do not have truly graceful exit as far as my
> understanding goes.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 19 мая 2020 г. в 10:22, 38797715 <38...@qq.com>:
>
>> Hi,
>>
>> the following log message:
>>
>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>> mvcc=false]
>>
>> I have the following questions:
>>
>> 1.What has been done in the startup cache in recovery mode?
>>
>> 2.After testing, if the node stops normally (non abnormal shutdown), the
>> recovery process will also be performed during startup. Why?
>> 在 2020/5/18 下午9:58, Ilya Kasnacheev 写道:
>>
>> Hello!
>>
>> Direct IO module is experimental and should not be used unless
>> performance is tested first, in your specific use case.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 18 мая 2020 г. в 16:47, 38797715 <38...@qq.com>:
>>
>>> Hi,
>>>
>>> If direct IO is disabled, the startup speed will be doubled, including
>>> some other tests. I find that direct IO has a great impact on the read
>>> performance.
>>> 在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>>>
>>> Can you share full logs from all nodes?
>>>
>>> вт, 12 мая 2020 г. в 18:24, 38797715 <38...@qq.com>:
>>>
>>>> Hi Evgenii,
>>>>
>>>> The storage used is not SSD.
>>>>
>>>> We will use different versions of ignite for further testing, such as
>>>> ignite2.8.
>>>> Ignite is configured as follows:
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <beans xmlns="http://www.springframework.org/schema/beans"
>>>> <http://www.springframework.org/schema/beans>
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> <http://www.w3.org/2001/XMLSchema-instance> xsi:schemaLocation="
>>>> http://www.springframework.org/schema/beans
>>>> http://www.springframework.org/schema/beans/spring-beans.xsd">
>>>> <bean id="ignite.cfg" class=
>>>> "org.apache.ignite.configuration.IgniteConfiguration">
>>>> <property name="peerClassLoadingEnabled" value="true"/>
>>>> <property name="consistentId" value="20"/>
>>>> <property name="failureDetectionTimeout" value="120000"/>
>>>> <property name="workDirectory" value="/appdata/ignite"/>
>>>> <property name="rebalanceBatchSize" value="#{2 * 1024 * 1024}"/>
>>>> <property name="rebalanceThrottle" value="100"/>
>>>> <property name="rebalanceThreadPoolSize" value="4"/>
>>>> <property name="gridLogger">
>>>> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>>> <constructor-arg type="java.lang.String" value=
>>>> "config/ignite-log4j2.xml"/>
>>>> </bean>
>>>> </property>
>>>> <property name="cacheConfiguration">
>>>> <list>
>>>> <bean id="partitioned-cache-template" abstract="true" class=
>>>> "org.apache.ignite.configuration.CacheConfiguration">
>>>> <property name="name" value="cache-partitioned*"/>
>>>> <property name="cacheMode" value="PARTITIONED" />
>>>> <property name="backups" value="1" />
>>>> <property name="queryParallelism" value="16"/>
>>>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>>>> </bean>
>>>> <bean id="replicated-cache-template" abstract="true" class=
>>>> "org.apache.ignite.configuration.CacheConfiguration">
>>>> <property name="name" value="cache-replicated*"/>
>>>> <property name="cacheMode" value="REPLICATED" />
>>>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>>>> </bean>
>>>> </list>
>>>> </property>
>>>> <!-- Enabling Apache Ignite Persistent Store. -->
>>>> <property name="dataStorageConfiguration">
>>>> <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
>>>> <property name="defaultDataRegionConfiguration">
>>>> <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
>>>> <property name="persistenceEnabled" value="true"/>
>>>> <property name="maxSize" value="#{200L * 1024 * 1024 * 1024}"/>
>>>> </bean>
>>>> </property>
>>>> </bean>
>>>> </property>
>>>> </bean>
>>>> </beans>
>>>> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>>>
>>>> Hi,
>>>>
>>>> Can you share full logs and configuration? What disk so you use?
>>>>
>>>> Evgenii
>>>>
>>>> вт, 12 мая 2020 г. в 06:49, 38797715 <38...@qq.com>:
>>>>
>>>>> Among them:
>>>>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>>>
>>>>> Ignite sys cache: ~ 27 minutes
>>>>>
>>>>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>>>>
>>>>>
>>>>> 在 2020/5/12 下午9:08, 38797715 写道:
>>>>>
>>>>> Hi community,
>>>>>
>>>>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>>>>> We have 7 tables to test, and the data volume is
>>>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
>>>>> partitioned(backup = 1)
>>>>>
>>>>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>>>>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>>>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>>>>> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
>>>>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>>>>
>>>>> Today, one of the servers was restarted(kill and then start ignite.sh)
>>>>> for some reason, but the node took 1.5 hours to start, which was much
>>>>> longer than expected.
>>>>>
>>>>> After analyzing the log, the following information is found:
>>>>> [2020-05-12T17:00:05,138][INFO
>>>>> ][main][GridCacheDatabaseSharedManager] Found last checkpoint marker [cpId=
>>>>> 7a0564f2-43e5-400b-9439-746fc68a6ccb, pos=FileWALPointer [idx=10511,
>>>>> fileOff=51348888, len=61193]]
>>>>> [2020-05-12T17:00:05,151][INFO
>>>>> ][main][GridCacheDatabaseSharedManager] Binary memory state restored at
>>>>> node startup [restoredPtr=FileWALPointer [idx=10511, fileOff=51410110,
>>>>> len=0]]
>>>>> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager]
>>>>> Resuming logging to WAL segment [file=/appdata/ignite/db/wal/24/
>>>>> 0000000000000001.wal, offset=51410110, ver=2]
>>>>> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
>>>>> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
>>>>> checkpointBuffer=2.0 GiB]
>>>>> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=CO_CO_NEW, id=-189779360,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
>>>>> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=
>>>>> 2147483647, mvcc=false]
>>>>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=PLM_ITEM, id=-1283854143,
>>>>> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, backups=
>>>>> 2147483647, mvcc=false]
>>>>> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=CO_CO, id=64322847,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=CO_CUST, id=1684722246,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:22:54,892][INFO
>>>>> ][main][GridCacheDatabaseSharedManager] Binary recovery performed in
>>>>> 4970233 ms.
>>>>>
>>>>> Among them, binary recovery took 4970 seconds.
>>>>>
>>>>> Our question is:
>>>>>
>>>>> 1.Why is the start time so long?
>>>>>
>>>>> 2.Is the current state of ignite, with the growth of single node data
>>>>> volume, the restart time will be longer and longer?
>>>>>
>>>>> 3.Do have any suggestions for optimizing the restart time?
>>>>>
>>>>>
Re: Binary recovery for a very long time
Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!
1. I guess that WAL is read.
2. Unfortunately we do not have truly graceful exit as far as my
understanding goes.
Regards,
--
Ilya Kasnacheev
вт, 19 мая 2020 г. в 10:22, 38797715 <38...@qq.com>:
> Hi,
>
> the following log message:
>
> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
>
> I have the following questions:
>
> 1.What has been done in the startup cache in recovery mode?
>
> 2.After testing, if the node stops normally (non abnormal shutdown), the
> recovery process will also be performed during startup. Why?
> 在 2020/5/18 下午9:58, Ilya Kasnacheev 写道:
>
> Hello!
>
> Direct IO module is experimental and should not be used unless performance
> is tested first, in your specific use case.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 18 мая 2020 г. в 16:47, 38797715 <38...@qq.com>:
>
>> Hi,
>>
>> If direct IO is disabled, the startup speed will be doubled, including
>> some other tests. I find that direct IO has a great impact on the read
>> performance.
>> 在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>>
>> Can you share full logs from all nodes?
>>
>> вт, 12 мая 2020 г. в 18:24, 38797715 <38...@qq.com>:
>>
>>> Hi Evgenii,
>>>
>>> The storage used is not SSD.
>>>
>>> We will use different versions of ignite for further testing, such as
>>> ignite2.8.
>>> Ignite is configured as follows:
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <beans xmlns="http://www.springframework.org/schema/beans"
>>> <http://www.springframework.org/schema/beans>
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>> <http://www.w3.org/2001/XMLSchema-instance> xsi:schemaLocation="
>>> http://www.springframework.org/schema/beans
>>> http://www.springframework.org/schema/beans/spring-beans.xsd">
>>> <bean id="ignite.cfg" class=
>>> "org.apache.ignite.configuration.IgniteConfiguration">
>>> <property name="peerClassLoadingEnabled" value="true"/>
>>> <property name="consistentId" value="20"/>
>>> <property name="failureDetectionTimeout" value="120000"/>
>>> <property name="workDirectory" value="/appdata/ignite"/>
>>> <property name="rebalanceBatchSize" value="#{2 * 1024 * 1024}"/>
>>> <property name="rebalanceThrottle" value="100"/>
>>> <property name="rebalanceThreadPoolSize" value="4"/>
>>> <property name="gridLogger">
>>> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>> <constructor-arg type="java.lang.String" value=
>>> "config/ignite-log4j2.xml"/>
>>> </bean>
>>> </property>
>>> <property name="cacheConfiguration">
>>> <list>
>>> <bean id="partitioned-cache-template" abstract="true" class=
>>> "org.apache.ignite.configuration.CacheConfiguration">
>>> <property name="name" value="cache-partitioned*"/>
>>> <property name="cacheMode" value="PARTITIONED" />
>>> <property name="backups" value="1" />
>>> <property name="queryParallelism" value="16"/>
>>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>>> </bean>
>>> <bean id="replicated-cache-template" abstract="true" class=
>>> "org.apache.ignite.configuration.CacheConfiguration">
>>> <property name="name" value="cache-replicated*"/>
>>> <property name="cacheMode" value="REPLICATED" />
>>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>>> </bean>
>>> </list>
>>> </property>
>>> <!-- Enabling Apache Ignite Persistent Store. -->
>>> <property name="dataStorageConfiguration">
>>> <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
>>> <property name="defaultDataRegionConfiguration">
>>> <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
>>> <property name="persistenceEnabled" value="true"/>
>>> <property name="maxSize" value="#{200L * 1024 * 1024 * 1024}"/>
>>> </bean>
>>> </property>
>>> </bean>
>>> </property>
>>> </bean>
>>> </beans>
>>> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>>
>>> Hi,
>>>
>>> Can you share full logs and configuration? What disk so you use?
>>>
>>> Evgenii
>>>
>>> вт, 12 мая 2020 г. в 06:49, 38797715 <38...@qq.com>:
>>>
>>>> Among them:
>>>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>>
>>>> Ignite sys cache: ~ 27 minutes
>>>>
>>>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>>>
>>>>
>>>> 在 2020/5/12 下午9:08, 38797715 写道:
>>>>
>>>> Hi community,
>>>>
>>>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>>>> We have 7 tables to test, and the data volume is
>>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
>>>> partitioned(backup = 1)
>>>>
>>>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>>>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>>>> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
>>>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>>>
>>>> Today, one of the servers was restarted(kill and then start ignite.sh)
>>>> for some reason, but the node took 1.5 hours to start, which was much
>>>> longer than expected.
>>>>
>>>> After analyzing the log, the following information is found:
>>>> [2020-05-12T17:00:05,138][INFO ][main][GridCacheDatabaseSharedManager]
>>>> Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>>> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
>>>> [2020-05-12T17:00:05,151][INFO ][main][GridCacheDatabaseSharedManager]
>>>> Binary memory state restored at node startup [restoredPtr=FileWALPointer
>>>> [idx=10511, fileOff=51410110, len=0]]
>>>> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager]
>>>> Resuming logging to WAL segment [file=/appdata/ignite/db/wal/24/
>>>> 0000000000000001.wal, offset=51410110, ver=2]
>>>> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
>>>> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
>>>> checkpointBuffer=2.0 GiB]
>>>> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO_NEW, id=-189779360,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
>>>> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=
>>>> 2147483647, mvcc=false]
>>>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=PLM_ITEM, id=-1283854143,
>>>> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, backups=
>>>> 2147483647, mvcc=false]
>>>> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO, id=64322847,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CUST, id=1684722246,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:22:54,892][INFO ][main][GridCacheDatabaseSharedManager]
>>>> Binary recovery performed in 4970233 ms.
>>>>
>>>> Among them, binary recovery took 4970 seconds.
>>>>
>>>> Our question is:
>>>>
>>>> 1.Why is the start time so long?
>>>>
>>>> 2.Is the current state of ignite, with the growth of single node data
>>>> volume, the restart time will be longer and longer?
>>>>
>>>> 3.Do have any suggestions for optimizing the restart time?
>>>>
>>>>
Re: Binary recovery for a very long time
Posted by 38797715 <38...@qq.com>.
Hi,
the following log message:
[2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache
in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
mvcc=false]
I have the following questions:
1.What has been done in the startup cache in recovery mode?
2.After testing, if the node stops normally (non abnormal shutdown), the
recovery process will also be performed during startup. Why?
在 2020/5/18 下午9:58, Ilya Kasnacheev 写道:
> Hello!
>
> Direct IO module is experimental and should not be used unless
> performance is tested first, in your specific use case.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 18 мая 2020 г. в 16:47, 38797715 <38797715@qq.com
> <ma...@qq.com>>:
>
> Hi,
>
> If direct IO is disabled, the startup speed will be doubled,
> including some other tests. I find that direct IO has a great
> impact on the read performance.
>
> 在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>> Can you share full logs from all nodes?
>>
>> вт, 12 мая 2020 г. в 18:24, 38797715 <38797715@qq.com
>> <ma...@qq.com>>:
>>
>> Hi Evgenii,
>>
>> The storage used is not SSD.
>>
>> We will use different versions of ignite for further testing,
>> such as ignite2.8.
>> Ignite is configured as follows:
>>
>> <?xmlversion="1.0"encoding="UTF-8"?>
>> <beansxmlns="http://www.springframework.org/schema/beans"
>> <http://www.springframework.org/schema/beans>
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> <http://www.w3.org/2001/XMLSchema-instance>xsi:schemaLocation="http://www.springframework.org/schema/beans
>> http://www.springframework.org/schema/beans/spring-beans.xsd">
>> <beanid="ignite.cfg"class="org.apache.ignite.configuration.IgniteConfiguration">
>> <propertyname="peerClassLoadingEnabled"value="true"/>
>> <propertyname="consistentId"value="20"/>
>> <propertyname="failureDetectionTimeout"value="120000"/>
>> <propertyname="workDirectory"value="/appdata/ignite"/>
>> <propertyname="rebalanceBatchSize"value="#{2 * 1024 * 1024}"/>
>> <propertyname="rebalanceThrottle"value="100"/>
>> <propertyname="rebalanceThreadPoolSize"value="4"/>
>> <propertyname="gridLogger">
>> <beanclass="org.apache.ignite.logger.log4j2.Log4J2Logger">
>> <constructor-argtype="java.lang.String"value="config/ignite-log4j2.xml"/>
>> </bean>
>> </property>
>> <propertyname="cacheConfiguration">
>> <list>
>> <beanid="partitioned-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
>> <propertyname="name"value="cache-partitioned*"/>
>> <propertyname="cacheMode"value="PARTITIONED"/>
>> <propertyname="backups"value="1"/>
>> <propertyname="queryParallelism"value="16"/>
>> <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
>> </bean>
>> <beanid="replicated-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
>> <propertyname="name"value="cache-replicated*"/>
>> <propertyname="cacheMode"value="REPLICATED"/>
>> <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
>> </bean>
>> </list>
>> </property>
>> <!-- Enabling Apache Ignite Persistent Store. -->
>> <propertyname="dataStorageConfiguration">
>> <beanclass="org.apache.ignite.configuration.DataStorageConfiguration">
>> <propertyname="defaultDataRegionConfiguration">
>> <beanclass="org.apache.ignite.configuration.DataRegionConfiguration">
>> <propertyname="persistenceEnabled"value="true"/>
>> <propertyname="maxSize"value="#{200L * 1024 * 1024 * 1024}"/>
>> </bean>
>> </property>
>> </bean>
>> </property>
>> </bean>
>> </beans>
>> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>> Hi,
>>>
>>> Can you share full logs and configuration? What disk so you use?
>>>
>>> Evgenii
>>>
>>> вт, 12 мая 2020 г. в 06:49, 38797715 <38797715@qq.com
>>> <ma...@qq.com>>:
>>>
>>> Among them:
>>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>
>>> Ignite sys cache: ~ 27 minutes
>>>
>>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>>
>>>
>>> 在 2020/5/12 下午9:08, 38797715 写道:
>>>>
>>>> Hi community,
>>>>
>>>> We have 5 servers, 16 cores, 256g memory, and 200g
>>>> off-heap memory.
>>>> We have 7 tables to test, and the data volume is
>>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
>>>> are partitioned(backup = 1)
>>>>
>>>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch
>>>> -XX:+UseG1GC -XX:+ScavengeBeforeFullGC
>>>> -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
>>>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>>>> -XX:GCLogFileSize=100M -Xloggc:/data/gc/logs/gclog.txt
>>>> -Djava.net.preferIPv4Stack=true
>>>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>>>
>>>> Today, one of the servers was restarted(kill and then
>>>> start ignite.sh) for some reason, but the node took 1.5
>>>> hours to start, which was much longer than expected.
>>>>
>>>> After analyzing the log, the following information is
>>>> found:
>>>>
>>>> [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
>>>> Found last checkpoint marker
>>>> [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>>> pos=FileWALPointer [idx=10511, fileOff=51348888,
>>>> len=61193]]
>>>> [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
>>>> Binary memory state restored at node startup
>>>> [restoredPtr=FileWALPointer [idx=10511,
>>>> fileOff=51410110, len=0]]
>>>> [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager]
>>>> Resuming logging to WAL segment
>>>> [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
>>>> offset=51410110, ver=2]
>>>> [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl]
>>>> Started page memory [memoryAllocated=200.0GiB,
>>>> pages=50821088, tableSize=3.9GiB, checkpointBuffer=2.0GiB]
>>>> [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor]
>>>> Started cache in recovery mode [name=CO_CO_NEW,
>>>> id=-189779360, dataRegionName=default,
>>>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>>>> [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor]
>>>> Started cache in recovery mode [name=CO_CO_LINE,
>>>> id=-1588248812, dataRegionName=default,
>>>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>>>> [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor]
>>>> Started cache in recovery mode [name=ignite-sys-cache,
>>>> id=-2100569601, dataRegionName=sysMemPlc,
>>>> mode=REPLICATED, atomicity=TRANSACTIONAL,
>>>> backups=2147483647, mvcc=false]
>>>> [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor]
>>>> Started cache in recovery mode [name=CO_CO_LINE_NEW,
>>>> id=1742991829, dataRegionName=default,
>>>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>>>> [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor]
>>>> Started cache in recovery mode [name=PI_COM_DAY,
>>>> id=-1904194728, dataRegionName=default,
>>>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>>>> [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor]
>>>> Started cache in recovery mode [name=PLM_ITEM,
>>>> id=-1283854143, dataRegionName=default,
>>>> mode=REPLICATED, atomicity=ATOMIC, backups=2147483647,
>>>> mvcc=false]
>>>> [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor]
>>>> Started cache in recovery mode [name=CO_CO,
>>>> id=64322847, dataRegionName=default, mode=PARTITIONED,
>>>> atomicity=ATOMIC, backups=1, mvcc=false]
>>>> [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor]
>>>> Started cache in recovery mode [name=CO_CUST,
>>>> id=1684722246, dataRegionName=default,
>>>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>>>> [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
>>>> Binary recovery performed in 4970233ms.
>>>>
>>>> Among them, binary recovery took 4970 seconds.
>>>>
>>>> Our question is:
>>>>
>>>> 1.Why is the start time so long?
>>>>
>>>> 2.Is the current state of ignite, with the growth of
>>>> single node data volume, the restart time will be
>>>> longer and longer?
>>>>
>>>> 3.Do have any suggestions for optimizing the restart time?
>>>>
Re: Binary recovery for a very long time
Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!
Direct IO module is experimental and should not be used unless performance
is tested first, in your specific use case.
Regards,
--
Ilya Kasnacheev
пн, 18 мая 2020 г. в 16:47, 38797715 <38...@qq.com>:
> Hi,
>
> If direct IO is disabled, the startup speed will be doubled, including
> some other tests. I find that direct IO has a great impact on the read
> performance.
> 在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>
> Can you share full logs from all nodes?
>
> вт, 12 мая 2020 г. в 18:24, 38797715 <38...@qq.com>:
>
>> Hi Evgenii,
>>
>> The storage used is not SSD.
>>
>> We will use different versions of ignite for further testing, such as
>> ignite2.8.
>> Ignite is configured as follows:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <beans xmlns="http://www.springframework.org/schema/beans"
>> <http://www.springframework.org/schema/beans>
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> <http://www.w3.org/2001/XMLSchema-instance> xsi:schemaLocation="
>> http://www.springframework.org/schema/beans
>> http://www.springframework.org/schema/beans/spring-beans.xsd">
>> <bean id="ignite.cfg" class=
>> "org.apache.ignite.configuration.IgniteConfiguration">
>> <property name="peerClassLoadingEnabled" value="true"/>
>> <property name="consistentId" value="20"/>
>> <property name="failureDetectionTimeout" value="120000"/>
>> <property name="workDirectory" value="/appdata/ignite"/>
>> <property name="rebalanceBatchSize" value="#{2 * 1024 * 1024}"/>
>> <property name="rebalanceThrottle" value="100"/>
>> <property name="rebalanceThreadPoolSize" value="4"/>
>> <property name="gridLogger">
>> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>> <constructor-arg type="java.lang.String" value="config/ignite-log4j2.xml"
>> />
>> </bean>
>> </property>
>> <property name="cacheConfiguration">
>> <list>
>> <bean id="partitioned-cache-template" abstract="true" class=
>> "org.apache.ignite.configuration.CacheConfiguration">
>> <property name="name" value="cache-partitioned*"/>
>> <property name="cacheMode" value="PARTITIONED" />
>> <property name="backups" value="1" />
>> <property name="queryParallelism" value="16"/>
>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>> </bean>
>> <bean id="replicated-cache-template" abstract="true" class=
>> "org.apache.ignite.configuration.CacheConfiguration">
>> <property name="name" value="cache-replicated*"/>
>> <property name="cacheMode" value="REPLICATED" />
>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>> </bean>
>> </list>
>> </property>
>> <!-- Enabling Apache Ignite Persistent Store. -->
>> <property name="dataStorageConfiguration">
>> <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
>> <property name="defaultDataRegionConfiguration">
>> <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
>> <property name="persistenceEnabled" value="true"/>
>> <property name="maxSize" value="#{200L * 1024 * 1024 * 1024}"/>
>> </bean>
>> </property>
>> </bean>
>> </property>
>> </bean>
>> </beans>
>> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>
>> Hi,
>>
>> Can you share full logs and configuration? What disk so you use?
>>
>> Evgenii
>>
>> вт, 12 мая 2020 г. в 06:49, 38797715 <38...@qq.com>:
>>
>>> Among them:
>>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>
>>> Ignite sys cache: ~ 27 minutes
>>>
>>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>>
>>>
>>> 在 2020/5/12 下午9:08, 38797715 写道:
>>>
>>> Hi community,
>>>
>>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>>> We have 7 tables to test, and the data volume is
>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
>>> partitioned(backup = 1)
>>>
>>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>>> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
>>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>>
>>> Today, one of the servers was restarted(kill and then start ignite.sh)
>>> for some reason, but the node took 1.5 hours to start, which was much
>>> longer than expected.
>>>
>>> After analyzing the log, the following information is found:
>>> [2020-05-12T17:00:05,138][INFO ][main][GridCacheDatabaseSharedManager]
>>> Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
>>> [2020-05-12T17:00:05,151][INFO ][main][GridCacheDatabaseSharedManager]
>>> Binary memory state restored at node startup [restoredPtr=FileWALPointer
>>> [idx=10511, fileOff=51410110, len=0]]
>>> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager]
>>> Resuming logging to WAL segment [file=/appdata/ignite/db/wal/24/
>>> 0000000000000001.wal, offset=51410110, ver=2]
>>> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
>>> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
>>> checkpointBuffer=2.0 GiB]
>>> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=CO_CO_NEW, id=-189779360,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
>>> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=
>>> 2147483647, mvcc=false]
>>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=PLM_ITEM, id=-1283854143,
>>> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, backups=
>>> 2147483647, mvcc=false]
>>> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=CO_CO, id=64322847,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=CO_CUST, id=1684722246,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T18:22:54,892][INFO ][main][GridCacheDatabaseSharedManager]
>>> Binary recovery performed in 4970233 ms.
>>>
>>> Among them, binary recovery took 4970 seconds.
>>>
>>> Our question is:
>>>
>>> 1.Why is the start time so long?
>>>
>>> 2.Is the current state of ignite, with the growth of single node data
>>> volume, the restart time will be longer and longer?
>>>
>>> 3.Do have any suggestions for optimizing the restart time?
>>>
>>>
Re: Binary recovery for a very long time
Posted by 38797715 <38...@qq.com>.
Hi,
If direct IO is disabled, the startup speed will be doubled, including
some other tests. I find that direct IO has a great impact on the read
performance.
在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
> Can you share full logs from all nodes?
>
> вт, 12 мая 2020 г. в 18:24, 38797715 <38797715@qq.com
> <ma...@qq.com>>:
>
> Hi Evgenii,
>
> The storage used is not SSD.
>
> We will use different versions of ignite for further testing, such
> as ignite2.8.
> Ignite is configured as follows:
>
> <?xmlversion="1.0"encoding="UTF-8"?>
> <beansxmlns="http://www.springframework.org/schema/beans"
> <http://www.springframework.org/schema/beans>
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> <http://www.w3.org/2001/XMLSchema-instance>xsi:schemaLocation="http://www.springframework.org/schema/beans
> http://www.springframework.org/schema/beans/spring-beans.xsd">
> <beanid="ignite.cfg"class="org.apache.ignite.configuration.IgniteConfiguration">
> <propertyname="peerClassLoadingEnabled"value="true"/>
> <propertyname="consistentId"value="20"/>
> <propertyname="failureDetectionTimeout"value="120000"/>
> <propertyname="workDirectory"value="/appdata/ignite"/>
> <propertyname="rebalanceBatchSize"value="#{2 * 1024 * 1024}"/>
> <propertyname="rebalanceThrottle"value="100"/>
> <propertyname="rebalanceThreadPoolSize"value="4"/>
> <propertyname="gridLogger">
> <beanclass="org.apache.ignite.logger.log4j2.Log4J2Logger">
> <constructor-argtype="java.lang.String"value="config/ignite-log4j2.xml"/>
> </bean>
> </property>
> <propertyname="cacheConfiguration">
> <list>
> <beanid="partitioned-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
> <propertyname="name"value="cache-partitioned*"/>
> <propertyname="cacheMode"value="PARTITIONED"/>
> <propertyname="backups"value="1"/>
> <propertyname="queryParallelism"value="16"/>
> <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
> </bean>
> <beanid="replicated-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
> <propertyname="name"value="cache-replicated*"/>
> <propertyname="cacheMode"value="REPLICATED"/>
> <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
> </bean>
> </list>
> </property>
> <!-- Enabling Apache Ignite Persistent Store. -->
> <propertyname="dataStorageConfiguration">
> <beanclass="org.apache.ignite.configuration.DataStorageConfiguration">
> <propertyname="defaultDataRegionConfiguration">
> <beanclass="org.apache.ignite.configuration.DataRegionConfiguration">
> <propertyname="persistenceEnabled"value="true"/>
> <propertyname="maxSize"value="#{200L * 1024 * 1024 * 1024}"/>
> </bean>
> </property>
> </bean>
> </property>
> </bean>
> </beans>
> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>> Hi,
>>
>> Can you share full logs and configuration? What disk so you use?
>>
>> Evgenii
>>
>> вт, 12 мая 2020 г. в 06:49, 38797715 <38797715@qq.com
>> <ma...@qq.com>>:
>>
>> Among them:
>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>
>> Ignite sys cache: ~ 27 minutes
>>
>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>
>>
>> 在 2020/5/12 下午9:08, 38797715 写道:
>>>
>>> Hi community,
>>>
>>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap
>>> memory.
>>> We have 7 tables to test, and the data volume is
>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
>>> are partitioned(backup = 1)
>>>
>>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch
>>> -XX:+UseG1GC -XX:+ScavengeBeforeFullGC
>>> -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
>>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>>> -XX:GCLogFileSize=100M -Xloggc:/data/gc/logs/gclog.txt
>>> -Djava.net.preferIPv4Stack=true -XX:MaxDirectMemorySize=256M
>>> -XX:+PrintAdaptiveSizePolicy
>>>
>>> Today, one of the servers was restarted(kill and then start
>>> ignite.sh) for some reason, but the node took 1.5 hours to
>>> start, which was much longer than expected.
>>>
>>> After analyzing the log, the following information is found:
>>>
>>> [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
>>> Found last checkpoint marker
>>> [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
>>> [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
>>> Binary memory state restored at node startup
>>> [restoredPtr=FileWALPointer [idx=10511, fileOff=51410110,
>>> len=0]]
>>> [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager]
>>> Resuming logging to WAL segment
>>> [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
>>> offset=51410110, ver=2]
>>> [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl]
>>> Started page memory [memoryAllocated=200.0GiB,
>>> pages=50821088, tableSize=3.9GiB, checkpointBuffer=2.0GiB]
>>> [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor]
>>> Started cache in recovery mode [name=CO_CO_NEW,
>>> id=-189779360, dataRegionName=default, mode=PARTITIONED,
>>> atomicity=ATOMIC, backups=1, mvcc=false]
>>> [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor]
>>> Started cache in recovery mode [name=CO_CO_LINE,
>>> id=-1588248812, dataRegionName=default, mode=PARTITIONED,
>>> atomicity=ATOMIC, backups=1, mvcc=false]
>>> [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor]
>>> Started cache in recovery mode [name=ignite-sys-cache,
>>> id=-2100569601, dataRegionName=sysMemPlc, mode=REPLICATED,
>>> atomicity=TRANSACTIONAL, backups=2147483647, mvcc=false]
>>> [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor]
>>> Started cache in recovery mode [name=CO_CO_LINE_NEW,
>>> id=1742991829, dataRegionName=default, mode=PARTITIONED,
>>> atomicity=ATOMIC, backups=1, mvcc=false]
>>> [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor]
>>> Started cache in recovery mode [name=PI_COM_DAY,
>>> id=-1904194728, dataRegionName=default, mode=PARTITIONED,
>>> atomicity=ATOMIC, backups=1, mvcc=false]
>>> [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor]
>>> Started cache in recovery mode [name=PLM_ITEM,
>>> id=-1283854143, dataRegionName=default, mode=REPLICATED,
>>> atomicity=ATOMIC, backups=2147483647, mvcc=false]
>>> [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor]
>>> Started cache in recovery mode [name=CO_CO, id=64322847,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>>> backups=1, mvcc=false]
>>> [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor]
>>> Started cache in recovery mode [name=CO_CUST, id=1684722246,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>>> backups=1, mvcc=false]
>>> [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
>>> Binary recovery performed in 4970233ms.
>>>
>>> Among them, binary recovery took 4970 seconds.
>>>
>>> Our question is:
>>>
>>> 1.Why is the start time so long?
>>>
>>> 2.Is the current state of ignite, with the growth of single
>>> node data volume, the restart time will be longer and longer?
>>>
>>> 3.Do have any suggestions for optimizing the restart time?
>>>
Re: Binary recovery for a very long time
Posted by Evgenii Zhuravlev <e....@gmail.com>.
Can you share full logs from all nodes?
вт, 12 мая 2020 г. в 18:24, 38797715 <38...@qq.com>:
> Hi Evgenii,
>
> The storage used is not SSD.
>
> We will use different versions of ignite for further testing, such as
> ignite2.8.
> Ignite is configured as follows:
> <?xml version="1.0" encoding="UTF-8"?>
> <beans xmlns="http://www.springframework.org/schema/beans"
> <http://www.springframework.org/schema/beans>
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> <http://www.w3.org/2001/XMLSchema-instance> xsi:schemaLocation="
> http://www.springframework.org/schema/beans
> http://www.springframework.org/schema/beans/spring-beans.xsd">
> <bean id="ignite.cfg" class=
> "org.apache.ignite.configuration.IgniteConfiguration">
> <property name="peerClassLoadingEnabled" value="true"/>
> <property name="consistentId" value="20"/>
> <property name="failureDetectionTimeout" value="120000"/>
> <property name="workDirectory" value="/appdata/ignite"/>
> <property name="rebalanceBatchSize" value="#{2 * 1024 * 1024}"/>
> <property name="rebalanceThrottle" value="100"/>
> <property name="rebalanceThreadPoolSize" value="4"/>
> <property name="gridLogger">
> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
> <constructor-arg type="java.lang.String" value="config/ignite-log4j2.xml"
> />
> </bean>
> </property>
> <property name="cacheConfiguration">
> <list>
> <bean id="partitioned-cache-template" abstract="true" class=
> "org.apache.ignite.configuration.CacheConfiguration">
> <property name="name" value="cache-partitioned*"/>
> <property name="cacheMode" value="PARTITIONED" />
> <property name="backups" value="1" />
> <property name="queryParallelism" value="16"/>
> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
> </bean>
> <bean id="replicated-cache-template" abstract="true" class=
> "org.apache.ignite.configuration.CacheConfiguration">
> <property name="name" value="cache-replicated*"/>
> <property name="cacheMode" value="REPLICATED" />
> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
> </bean>
> </list>
> </property>
> <!-- Enabling Apache Ignite Persistent Store. -->
> <property name="dataStorageConfiguration">
> <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
> <property name="defaultDataRegionConfiguration">
> <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
> <property name="persistenceEnabled" value="true"/>
> <property name="maxSize" value="#{200L * 1024 * 1024 * 1024}"/>
> </bean>
> </property>
> </bean>
> </property>
> </bean>
> </beans>
> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>
> Hi,
>
> Can you share full logs and configuration? What disk so you use?
>
> Evgenii
>
> вт, 12 мая 2020 г. в 06:49, 38797715 <38...@qq.com>:
>
>> Among them:
>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>
>> Ignite sys cache: ~ 27 minutes
>>
>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>
>>
>> 在 2020/5/12 下午9:08, 38797715 写道:
>>
>> Hi community,
>>
>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>> We have 7 tables to test, and the data volume is
>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
>> partitioned(backup = 1)
>>
>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>
>> Today, one of the servers was restarted(kill and then start ignite.sh)
>> for some reason, but the node took 1.5 hours to start, which was much
>> longer than expected.
>>
>> After analyzing the log, the following information is found:
>> [2020-05-12T17:00:05,138][INFO ][main][GridCacheDatabaseSharedManager]
>> Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
>> [2020-05-12T17:00:05,151][INFO ][main][GridCacheDatabaseSharedManager]
>> Binary memory state restored at node startup [restoredPtr=FileWALPointer
>> [idx=10511, fileOff=51410110, len=0]]
>> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager]
>> Resuming logging to WAL segment [file=/appdata/ignite/db/wal/24/
>> 0000000000000001.wal, offset=51410110, ver=2]
>> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
>> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
>> checkpointBuffer=2.0 GiB]
>> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CO_NEW, id=-189779360, dataRegionName=default,
>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CO_LINE, id=-1588248812,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>> mvcc=false]
>> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=ignite-sys-cache, id=-2100569601,
>> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=
>> 2147483647, mvcc=false]
>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>> mvcc=false]
>> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=PI_COM_DAY, id=-1904194728,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>> mvcc=false]
>> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=PLM_ITEM, id=-1283854143, dataRegionName=default,
>> mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
>> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CO, id=64322847, dataRegionName=default,
>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CUST, id=1684722246, dataRegionName=default,
>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>> [2020-05-12T18:22:54,892][INFO ][main][GridCacheDatabaseSharedManager]
>> Binary recovery performed in 4970233 ms.
>>
>> Among them, binary recovery took 4970 seconds.
>>
>> Our question is:
>>
>> 1.Why is the start time so long?
>>
>> 2.Is the current state of ignite, with the growth of single node data
>> volume, the restart time will be longer and longer?
>>
>> 3.Do have any suggestions for optimizing the restart time?
>>
>>
Re: Binary recovery for a very long time
Posted by 38797715 <38...@qq.com>.
Hi Evgenii,
The storage used is not SSD.
We will use different versions of ignite for further testing, such as
ignite2.8.
Ignite is configured as follows:
<?xmlversion="1.0"encoding="UTF-8"?>
<beansxmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<beanid="ignite.cfg"class="org.apache.ignite.configuration.IgniteConfiguration">
<propertyname="peerClassLoadingEnabled"value="true"/>
<propertyname="consistentId"value="20"/>
<propertyname="failureDetectionTimeout"value="120000"/>
<propertyname="workDirectory"value="/appdata/ignite"/>
<propertyname="rebalanceBatchSize"value="#{2 * 1024 * 1024}"/>
<propertyname="rebalanceThrottle"value="100"/>
<propertyname="rebalanceThreadPoolSize"value="4"/>
<propertyname="gridLogger">
<beanclass="org.apache.ignite.logger.log4j2.Log4J2Logger">
<constructor-argtype="java.lang.String"value="config/ignite-log4j2.xml"/>
</bean>
</property>
<propertyname="cacheConfiguration">
<list>
<beanid="partitioned-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
<propertyname="name"value="cache-partitioned*"/>
<propertyname="cacheMode"value="PARTITIONED"/>
<propertyname="backups"value="1"/>
<propertyname="queryParallelism"value="16"/>
<propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
</bean>
<beanid="replicated-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
<propertyname="name"value="cache-replicated*"/>
<propertyname="cacheMode"value="REPLICATED"/>
<propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
</bean>
</list>
</property>
<!-- Enabling Apache Ignite Persistent Store. -->
<propertyname="dataStorageConfiguration">
<beanclass="org.apache.ignite.configuration.DataStorageConfiguration">
<propertyname="defaultDataRegionConfiguration">
<beanclass="org.apache.ignite.configuration.DataRegionConfiguration">
<propertyname="persistenceEnabled"value="true"/>
<propertyname="maxSize"value="#{200L * 1024 * 1024 * 1024}"/>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
> Hi,
>
> Can you share full logs and configuration? What disk so you use?
>
> Evgenii
>
> вт, 12 мая 2020 г. в 06:49, 38797715 <38797715@qq.com
> <ma...@qq.com>>:
>
> Among them:
> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>
> Ignite sys cache: ~ 27 minutes
>
> PLM_ITEM:~3 minutes(repicated,1.9K)
>
>
> 在 2020/5/12 下午9:08, 38797715 写道:
>>
>> Hi community,
>>
>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>> We have 7 tables to test, and the data volume is
>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
>> are partitioned(backup = 1)
>>
>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>> -XX:GCLogFileSize=100M -Xloggc:/data/gc/logs/gclog.txt
>> -Djava.net.preferIPv4Stack=true -XX:MaxDirectMemorySize=256M
>> -XX:+PrintAdaptiveSizePolicy
>>
>> Today, one of the servers was restarted(kill and then start
>> ignite.sh) for some reason, but the node took 1.5 hours to start,
>> which was much longer than expected.
>>
>> After analyzing the log, the following information is found:
>>
>> [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
>> Found last checkpoint marker
>> [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb, pos=FileWALPointer
>> [idx=10511, fileOff=51348888, len=61193]]
>> [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
>> Binary memory state restored at node startup
>> [restoredPtr=FileWALPointer [idx=10511, fileOff=51410110, len=0]]
>> [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager]
>> Resuming logging to WAL segment
>> [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
>> offset=51410110, ver=2]
>> [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl] Started
>> page memory [memoryAllocated=200.0GiB, pages=50821088,
>> tableSize=3.9GiB, checkpointBuffer=2.0GiB]
>> [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor] Started
>> cache in recovery mode [name=CO_CO_NEW, id=-189779360,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>> backups=1, mvcc=false]
>> [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor] Started
>> cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>> backups=1, mvcc=false]
>> [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor] Started
>> cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
>> dataRegionName=sysMemPlc, mode=REPLICATED,
>> atomicity=TRANSACTIONAL, backups=2147483647, mvcc=false]
>> [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor] Started
>> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>> backups=1, mvcc=false]
>> [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor] Started
>> cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>> backups=1, mvcc=false]
>> [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor] Started
>> cache in recovery mode [name=PLM_ITEM, id=-1283854143,
>> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC,
>> backups=2147483647, mvcc=false]
>> [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor] Started
>> cache in recovery mode [name=CO_CO, id=64322847,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>> backups=1, mvcc=false]
>> [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor] Started
>> cache in recovery mode [name=CO_CUST, id=1684722246,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>> backups=1, mvcc=false]
>> [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
>> Binary recovery performed in 4970233ms.
>>
>> Among them, binary recovery took 4970 seconds.
>>
>> Our question is:
>>
>> 1.Why is the start time so long?
>>
>> 2.Is the current state of ignite, with the growth of single node
>> data volume, the restart time will be longer and longer?
>>
>> 3.Do have any suggestions for optimizing the restart time?
>>
Re: Binary recovery for a very long time
Posted by Evgenii Zhuravlev <e....@gmail.com>.
Hi,
Can you share full logs and configuration? What disk so you use?
Evgenii
вт, 12 мая 2020 г. в 06:49, 38797715 <38...@qq.com>:
> Among them:
> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>
> Ignite sys cache: ~ 27 minutes
>
> PLM_ITEM:~3 minutes(repicated,1.9K)
>
>
> 在 2020/5/12 下午9:08, 38797715 写道:
>
> Hi community,
>
> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
> We have 7 tables to test, and the data volume is
> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
> partitioned(backup = 1)
>
> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>
> Today, one of the servers was restarted(kill and then start ignite.sh) for
> some reason, but the node took 1.5 hours to start, which was much longer
> than expected.
>
> After analyzing the log, the following information is found:
> [2020-05-12T17:00:05,138][INFO ][main][GridCacheDatabaseSharedManager]
> Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
> [2020-05-12T17:00:05,151][INFO ][main][GridCacheDatabaseSharedManager]
> Binary memory state restored at node startup [restoredPtr=FileWALPointer
> [idx=10511, fileOff=51410110, len=0]]
> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager] Resuming
> logging to WAL segment [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
> offset=51410110, ver=2]
> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
> checkpointBuffer=2.0 GiB]
> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO_NEW, id=-189779360, dataRegionName=default,
> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO_LINE, id=-1588248812,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=ignite-sys-cache, id=-2100569601,
> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=
> 2147483647, mvcc=false]
> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=PI_COM_DAY, id=-1904194728,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=PLM_ITEM, id=-1283854143, dataRegionName=default,
> mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO, id=64322847, dataRegionName=default,
> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CUST, id=1684722246, dataRegionName=default,
> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
> [2020-05-12T18:22:54,892][INFO ][main][GridCacheDatabaseSharedManager]
> Binary recovery performed in 4970233 ms.
>
> Among them, binary recovery took 4970 seconds.
>
> Our question is:
>
> 1.Why is the start time so long?
>
> 2.Is the current state of ignite, with the growth of single node data
> volume, the restart time will be longer and longer?
>
> 3.Do have any suggestions for optimizing the restart time?
>
>
Re: Binary recovery for a very long time
Posted by 38797715 <38...@qq.com>.
Among them:
CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
Ignite sys cache: ~ 27 minutes
PLM_ITEM:~3 minutes(repicated,1.9K)
在 2020/5/12 下午9:08, 38797715 写道:
>
> Hi community,
>
> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
> We have 7 tables to test, and the data volume is
> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
> are partitioned(backup = 1)
>
> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> -XX:GCLogFileSize=100M -Xloggc:/data/gc/logs/gclog.txt
> -Djava.net.preferIPv4Stack=true -XX:MaxDirectMemorySize=256M
> -XX:+PrintAdaptiveSizePolicy
>
> Today, one of the servers was restarted(kill and then start ignite.sh)
> for some reason, but the node took 1.5 hours to start, which was much
> longer than expected.
>
> After analyzing the log, the following information is found:
>
> [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
> Found last checkpoint marker
> [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb, pos=FileWALPointer
> [idx=10511, fileOff=51348888, len=61193]]
> [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
> Binary memory state restored at node startup
> [restoredPtr=FileWALPointer [idx=10511, fileOff=51410110, len=0]]
> [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager]
> Resuming logging to WAL segment
> [file=/appdata/ignite/db/wal/24/0000000000000001.wal, offset=51410110,
> ver=2]
> [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl] Started page
> memory [memoryAllocated=200.0GiB, pages=50821088, tableSize=3.9GiB,
> checkpointBuffer=2.0GiB]
> [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor] Started
> cache in recovery mode [name=CO_CO_NEW, id=-189779360,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor] Started
> cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor] Started
> cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL,
> backups=2147483647, mvcc=false]
> [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor] Started
> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor] Started
> cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor] Started
> cache in recovery mode [name=PLM_ITEM, id=-1283854143,
> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC,
> backups=2147483647, mvcc=false]
> [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor] Started
> cache in recovery mode [name=CO_CO, id=64322847,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor] Started
> cache in recovery mode [name=CO_CUST, id=1684722246,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
> Binary recovery performed in 4970233ms.
>
> Among them, binary recovery took 4970 seconds.
>
> Our question is:
>
> 1.Why is the start time so long?
>
> 2.Is the current state of ignite, with the growth of single node data
> volume, the restart time will be longer and longer?
>
> 3.Do have any suggestions for optimizing the restart time?
>