You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by 38797715 <38...@qq.com> on 2020/05/12 13:08:48 UTC

Binary recovery for a very long time

Hi community,

We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
We have 7 tables to test, and the data volume is 
respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others 
are partitioned(backup = 1)

VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC 
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M 
-Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true 
-XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy

Today, one of the servers was restarted(kill and then start ignite.sh) 
for some reason, but the node took 1.5 hours to start, which was much 
longer than expected.

After analyzing the log, the following information is found:

[2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager] 
Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb, 
pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
[2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager] 
Binary memory state restored at node startup [restoredPtr=FileWALPointer 
[idx=10511, fileOff=51410110, len=0]]
[2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager] Resuming 
logging to WAL segment 
[file=/appdata/ignite/db/wal/24/0000000000000001.wal, offset=51410110, 
ver=2]
[2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl] Started page 
memory [memoryAllocated=200.0GiB, pages=50821088, tableSize=3.9GiB, 
checkpointBuffer=2.0GiB]
[2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor] Started cache 
in recovery mode [name=CO_CO_NEW, id=-189779360, dataRegionName=default, 
mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
[2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor] Started cache 
in recovery mode [name=CO_CO_LINE, id=-1588248812, 
dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, 
mvcc=false]
[2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor] Started cache 
in recovery mode [name=ignite-sys-cache, id=-2100569601, 
dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, 
backups=2147483647, mvcc=false]
[2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor] Started cache 
in recovery mode [name=CO_CO_LINE_NEW, id=1742991829, 
dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, 
mvcc=false]
[2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor] Started cache 
in recovery mode [name=PI_COM_DAY, id=-1904194728, 
dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, 
mvcc=false]
[2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor] Started cache 
in recovery mode [name=PLM_ITEM, id=-1283854143, dataRegionName=default, 
mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
[2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor] Started cache 
in recovery mode [name=CO_CO, id=64322847, dataRegionName=default, 
mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
[2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor] Started cache 
in recovery mode [name=CO_CUST, id=1684722246, dataRegionName=default, 
mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
[2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager] 
Binary recovery performed in 4970233ms.

Among them, binary recovery took 4970 seconds.

Our question is:

1.Why is the start time so long?

2.Is the current state of ignite, with the growth of single node data 
volume, the restart time will be longer and longer?

3.Do have any suggestions for optimizing the restart time?

Re: Binary recovery for a very long time

Posted by 38797715 <38...@qq.com>.

Hi,

I enabled debug logging and found the following log output:

[2020-05-23T21:43:58,397][DEBUG][nio-acceptor-tcp-comm-#28%ClusterName1%][TcpCommunicationSpi] 
Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]
[2020-05-23T21:43:58,405][DEBUG][main][SchemaManager] Creating DB table 
with SQL: CREATE TABLE "PUBLIC"."CO_CUST"(_KEY VARCHAR INVISIBLE NOT 
NULL,_VAL OTHER INVISIBLE,"CUST_ID"VARCHAR(30) NOT 
NULL,"CUST_NAME"VARCHAR(200),"CUST_SHORT_NAME"VARCHAR(200),"CUST_SHORT_ID"VARCHAR(240),"LICENSE_CODE"VARCHAR(30),"STATUS"VARCHAR(16),"COM_ID"VARCHAR(30),"SALE_CENTER_ID"VARCHAR(30),"SALE_DEPT_ID"VARCHAR(30),"SLSMGR_ID"VARCHAR(30),"SLSMAN_ID"VARCHAR(30),"SLSMAN_MOBILE"VARCHAR(16),"MANAGER"VARCHAR(100),"IDENTITY_CARD_ID"VARCHAR(36),"ORDER_TEL"VARCHAR(80),"INV_TYPE"VARCHAR(18),"ORDER_WAY"VARCHAR(18),"PAY_TYPE"VARCHAR(2),"PERIODS"VARCHAR(200),"PRD_ST_DATE"VARCHAR(8),"ORDER_CUST_ID"VARCHAR(30),"BUSI_ADDR"VARCHAR(400),"WORK_PORT"VARCHAR(18),"BASE_TYPE"VARCHAR(24),"SALE_SCOPE"VARCHAR(1),"SCOPE"VARCHAR(200),"COM_CHARA"VARCHAR(2),"INNER_TYPE"VARCHAR(18),"CUST_KIND"VARCHAR(10),"CUST_KIND_NAME"VARCHAR(120),"CUST_TYPE"VARCHAR(6),"CUST_TYPE1"VARCHAR(6),"CUST_TYPE2"VARCHAR(6),"CUST_TYPE3"VARCHAR(6),"CUST_TYPE4"VARCHAR(6),"CUST_TYPE5"VARCHAR(6),"AREA_TYPE"VARCHAR(2),"IS_SEFL_CUST"VARCHAR(1),"IS_FUNC_CUST"VARCHAR(1),"MANAGER_BIRTHDAY"VARCHAR(8),"CELEBRATE_DATE"VARCHAR(8),"NATION_CUST_CODE"VARCHAR(30),"LONGITUDE"DECIMAL,"LATITUDE"DECIMAL,"AGENT_MESSAGE"VARCHAR(400),"NOTE"VARCHAR(1000),"UPDATE_TIME"TIMESTAMP,"INVTY_ID"VARCHAR(30),"STEP_ID"VARCHAR(20),"INV_CUST_ID"VARCHAR(60),"INV_UNIT_NAME"VARCHAR(200),"ACCOUNT"VARCHAR(200),"BANK"VARCHAR(100),"TAX_ACCOUNT"VARCHAR(120),"OTHER_ORDER_WAY"VARCHAR(32),"SALE_AVG"DECIMAL,"ITEM_ORD"DECIMAL,"QTY_SOLD"DECIMAL,"AMT_SOLD"DECIMAL,"MANAGER_TEL"VARCHAR(80),"IS_SALE_LARGE"VARCHAR(1),"TAX_TEL"VARCHAR(60),"TAX_ADDR"VARCHAR(400),"IS_TOR_TAX"VARCHAR(1),"CUST_TYPE6"VARCHAR(6),"CUST_TYPE7"VARCHAR(6),"CUST_TYPE8"VARCHAR(124),"CUST_TYPE9"VARCHAR(60),"CUST_TYPE10"VARCHAR(200),"CANT_ID"VARCHAR(20),"IS_ONLINE_PAY"VARCHAR(1),"IS_RAIL_CUST"VARCHAR(1),"CUST_SEG"VARCHAR(30),"QTY_MULTIPLE"VARCHAR(20),"TAX_ADRR"VARCHAR(200),"COLLECT_STAFF_ID"VARCHAR(30),"ITEM_HEIGHT"DECIMAL,"AREA_ID"VARCHAR(30),"BASE_TYPE_EXT"VARCHAR(30),"AREA_TYPE_EXT"VARCHAR(30),"WORK_PORT_EXT"VARCHAR(30),"CUST_SEG_EXT"VARCHAR(30),"IS_CIGAR_CUST"VARCHAR(1))
[2020-05-23T21:43:58,411][DEBUG][main][IgniteH2Indexing] Creating cache 
index [cacheId=1684722246, idxName=_key_PK]
[2020-05-23T21:43:59,081][DEBUG][main][IgniteH2Indexing] Creating cache 
index [cacheId=1684722246, idxName=IDX_CO_CUST_SALE_CENTER_ID]
[2020-05-23T21:43:59,088][DEBUG][main][IgniteH2Indexing] Creating cache 
index [cacheId=1684722246, idxName=IDX_CO_CUST_STATUS]
[2020-05-23T21:43:59,099][DEBUG][main][IgniteH2Indexing] Creating cache 
index [cacheId=1684722246, idxName=IDX_CO_CUST_SALE_DEPT_ID]
[2020-05-23T21:43:59,109][INFO][main][GridCacheProcessor] Started cache 
in recovery mode [name=CO_CUST, id=1684722246, dataRegionName=default, 
mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]

It can be seen from the log that all tables and indexes need to be 
rebuilt during node startup? Just don't load data?

在 2020/5/21 下午8:48, Ilya Kasnacheev 写道:
> Hello!
>
> 1. I guess that WAL is read.
> 2. Unfortunately we do not have truly graceful exit as far as my 
> understanding goes.
>
> Regards,
> -- 
> Ilya Kasnacheev
>
>
> вт, 19 мая 2020 г. в 10:22, 38797715 <38797715@qq.com 
> <ma...@qq.com>>:
>
>     Hi,
>
>     the following log message:
>
>     [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started
>     cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>     dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>     backups=1, mvcc=false]
>
>     I have the following questions:
>
>     1.What has been done in the startup cache in recovery mode?
>
>     2.After testing, if the node stops normally (non abnormal
>     shutdown), the recovery process will also be performed during
>     startup. Why?
>
>     在 2020/5/18 下午9:58, Ilya Kasnacheev 写道:
>>     Hello!
>>
>>     Direct IO module is experimental and should not be used unless
>>     performance is tested first, in your specific use case.
>>
>>     Regards,
>>     -- 
>>     Ilya Kasnacheev
>>
>>
>>     пн, 18 мая 2020 г. в 16:47, 38797715 <38797715@qq.com
>>     <ma...@qq.com>>:
>>
>>         Hi,
>>
>>         If direct IO is disabled, the startup speed will be doubled,
>>         including some other tests. I find that direct IO has a great
>>         impact on the read performance.
>>
>>         在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>>>         Can you share full logs from all nodes?
>>>
>>>         вт, 12 мая 2020 г. в 18:24, 38797715 <38797715@qq.com
>>>         <ma...@qq.com>>:
>>>
>>>             Hi Evgenii,
>>>
>>>             The storage used is not SSD.
>>>
>>>             We will use different versions of ignite for further
>>>             testing, such as ignite2.8.
>>>             Ignite is configured as follows:
>>>
>>>             <?xmlversion="1.0"encoding="UTF-8"?>
>>>             <beansxmlns="http://www.springframework.org/schema/beans"
>>>             <http://www.springframework.org/schema/beans>
>>>             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>             <http://www.w3.org/2001/XMLSchema-instance>xsi:schemaLocation="http://www.springframework.org/schema/beans
>>>             http://www.springframework.org/schema/beans/spring-beans.xsd">
>>>             <beanid="ignite.cfg"class="org.apache.ignite.configuration.IgniteConfiguration">
>>>             <propertyname="peerClassLoadingEnabled"value="true"/>
>>>             <propertyname="consistentId"value="20"/>
>>>             <propertyname="failureDetectionTimeout"value="120000"/>
>>>             <propertyname="workDirectory"value="/appdata/ignite"/>
>>>             <propertyname="rebalanceBatchSize"value="#{2 * 1024 *
>>>             1024}"/>
>>>             <propertyname="rebalanceThrottle"value="100"/>
>>>             <propertyname="rebalanceThreadPoolSize"value="4"/>
>>>             <propertyname="gridLogger">
>>>             <beanclass="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>>             <constructor-argtype="java.lang.String"value="config/ignite-log4j2.xml"/>
>>>             </bean>
>>>             </property>
>>>             <propertyname="cacheConfiguration">
>>>             <list>
>>>             <beanid="partitioned-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
>>>             <propertyname="name"value="cache-partitioned*"/>
>>>             <propertyname="cacheMode"value="PARTITIONED"/>
>>>             <propertyname="backups"value="1"/>
>>>             <propertyname="queryParallelism"value="16"/>
>>>             <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
>>>             </bean>
>>>             <beanid="replicated-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
>>>             <propertyname="name"value="cache-replicated*"/>
>>>             <propertyname="cacheMode"value="REPLICATED"/>
>>>             <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
>>>             </bean>
>>>             </list>
>>>             </property>
>>>             <!-- Enabling Apache Ignite Persistent Store. -->
>>>             <propertyname="dataStorageConfiguration">
>>>             <beanclass="org.apache.ignite.configuration.DataStorageConfiguration">
>>>             <propertyname="defaultDataRegionConfiguration">
>>>             <beanclass="org.apache.ignite.configuration.DataRegionConfiguration">
>>>             <propertyname="persistenceEnabled"value="true"/>
>>>             <propertyname="maxSize"value="#{200L * 1024 * 1024 *
>>>             1024}"/>
>>>             </bean>
>>>             </property>
>>>             </bean>
>>>             </property>
>>>             </bean>
>>>             </beans>
>>>             在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>>>             Hi,
>>>>
>>>>             Can you share full logs and configuration? What disk so
>>>>             you use?
>>>>
>>>>             Evgenii
>>>>
>>>>             вт, 12 мая 2020 г. в 06:49, 38797715 <38797715@qq.com
>>>>             <ma...@qq.com>>:
>>>>
>>>>                 Among them:
>>>>                 CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>>
>>>>                 Ignite sys cache: ~ 27 minutes
>>>>
>>>>                 PLM_ITEM:~3 minutes(repicated,1.9K)
>>>>
>>>>
>>>>                 在 2020/5/12 下午9:08, 38797715 写道:
>>>>>
>>>>>                 Hi community,
>>>>>
>>>>>                 We have 5 servers, 16 cores, 256g memory, and 200g
>>>>>                 off-heap memory.
>>>>>                 We have 7 tables to test, and the data volume is
>>>>>                 respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
>>>>>                 are partitioned(backup = 1)
>>>>>
>>>>>                 VM args:-server -Xms20g -Xmx20g
>>>>>                 -XX:+AlwaysPreTouch -XX:+UseG1GC
>>>>>                 -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
>>>>>                 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
>>>>>                 -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>>>>>                 -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>>>>>                 -Xloggc:/data/gc/logs/gclog.txt
>>>>>                 -Djava.net.preferIPv4Stack=true
>>>>>                 -XX:MaxDirectMemorySize=256M
>>>>>                 -XX:+PrintAdaptiveSizePolicy
>>>>>
>>>>>                 Today, one of the servers was restarted(kill and
>>>>>                 then start ignite.sh) for some reason, but the
>>>>>                 node took 1.5 hours to start, which was much
>>>>>                 longer than expected.
>>>>>
>>>>>                 After analyzing the log, the following information
>>>>>                 is found:
>>>>>
>>>>>                 [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
>>>>>                 Found last checkpoint marker
>>>>>                 [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>>>>                 pos=FileWALPointer [idx=10511, fileOff=51348888,
>>>>>                 len=61193]]
>>>>>                 [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
>>>>>                 Binary memory state restored at node startup
>>>>>                 [restoredPtr=FileWALPointer [idx=10511,
>>>>>                 fileOff=51410110, len=0]]
>>>>>                 [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager]
>>>>>                 Resuming logging to WAL segment
>>>>>                 [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
>>>>>                 offset=51410110, ver=2]
>>>>>                 [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl]
>>>>>                 Started page memory [memoryAllocated=200.0GiB,
>>>>>                 pages=50821088, tableSize=3.9GiB,
>>>>>                 checkpointBuffer=2.0GiB]
>>>>>                 [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor]
>>>>>                 Started cache in recovery mode [name=CO_CO_NEW,
>>>>>                 id=-189779360, dataRegionName=default,
>>>>>                 mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>>                 mvcc=false]
>>>>>                 [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor]
>>>>>                 Started cache in recovery mode [name=CO_CO_LINE,
>>>>>                 id=-1588248812, dataRegionName=default,
>>>>>                 mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>>                 mvcc=false]
>>>>>                 [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor]
>>>>>                 Started cache in recovery mode
>>>>>                 [name=ignite-sys-cache, id=-2100569601,
>>>>>                 dataRegionName=sysMemPlc, mode=REPLICATED,
>>>>>                 atomicity=TRANSACTIONAL, backups=2147483647,
>>>>>                 mvcc=false]
>>>>>                 [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor]
>>>>>                 Started cache in recovery mode
>>>>>                 [name=CO_CO_LINE_NEW, id=1742991829,
>>>>>                 dataRegionName=default, mode=PARTITIONED,
>>>>>                 atomicity=ATOMIC, backups=1, mvcc=false]
>>>>>                 [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor]
>>>>>                 Started cache in recovery mode [name=PI_COM_DAY,
>>>>>                 id=-1904194728, dataRegionName=default,
>>>>>                 mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>>                 mvcc=false]
>>>>>                 [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor]
>>>>>                 Started cache in recovery mode [name=PLM_ITEM,
>>>>>                 id=-1283854143, dataRegionName=default,
>>>>>                 mode=REPLICATED, atomicity=ATOMIC,
>>>>>                 backups=2147483647, mvcc=false]
>>>>>                 [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor]
>>>>>                 Started cache in recovery mode [name=CO_CO,
>>>>>                 id=64322847, dataRegionName=default,
>>>>>                 mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>>                 mvcc=false]
>>>>>                 [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor]
>>>>>                 Started cache in recovery mode [name=CO_CUST,
>>>>>                 id=1684722246, dataRegionName=default,
>>>>>                 mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>>                 mvcc=false]
>>>>>                 [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
>>>>>                 Binary recovery performed in 4970233ms.
>>>>>
>>>>>                 Among them, binary recovery took 4970 seconds.
>>>>>
>>>>>                 Our question is:
>>>>>
>>>>>                 1.Why is the start time so long?
>>>>>
>>>>>                 2.Is the current state of ignite, with the growth
>>>>>                 of single node data volume, the restart time will
>>>>>                 be longer and longer?
>>>>>
>>>>>                 3.Do have any suggestions for optimizing the
>>>>>                 restart time?
>>>>>

Re: Binary recovery for a very long time

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

We use H2 engine and we need to create tables within H2. We're not touching
much data in the process. I'm not sure this is a factor of slowdown since
it fits in 2s here.

Regards,
-- 
Ilya Kasnacheev


ср, 27 мая 2020 г. в 04:20, 38797715 <38...@qq.com>:

> Hi,
>
> I enabled debug logging and found the following log output:
> [2020-05-23T21:43:58,397][DEBUG][nio-acceptor-tcp-comm-#28%ClusterName1%][TcpCommunicationSpi]
> Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]
> [2020-05-23T21:43:58,405][DEBUG][main][SchemaManager] Creating DB table
> with SQL: CREATE TABLE "PUBLIC"."CO_CUST" (_KEY VARCHAR INVISIBLE NOT
> NULL,_VAL OTHER INVISIBLE,"CUST_ID" VARCHAR(30) NOT NULL,"CUST_NAME"
> VARCHAR(200),"CUST_SHORT_NAME" VARCHAR(200),"CUST_SHORT_ID" VARCHAR(240),
> "LICENSE_CODE" VARCHAR(30),"STATUS" VARCHAR(16),"COM_ID" VARCHAR(30),
> "SALE_CENTER_ID" VARCHAR(30),"SALE_DEPT_ID" VARCHAR(30),"SLSMGR_ID"
> VARCHAR(30),"SLSMAN_ID" VARCHAR(30),"SLSMAN_MOBILE" VARCHAR(16),"MANAGER"
> VARCHAR(100),"IDENTITY_CARD_ID" VARCHAR(36),"ORDER_TEL" VARCHAR(80),
> "INV_TYPE" VARCHAR(18),"ORDER_WAY" VARCHAR(18),"PAY_TYPE" VARCHAR(2),
> "PERIODS" VARCHAR(200),"PRD_ST_DATE" VARCHAR(8),"ORDER_CUST_ID" VARCHAR(30
> ),"BUSI_ADDR" VARCHAR(400),"WORK_PORT" VARCHAR(18),"BASE_TYPE" VARCHAR(24
> ),"SALE_SCOPE" VARCHAR(1),"SCOPE" VARCHAR(200),"COM_CHARA" VARCHAR(2),
> "INNER_TYPE" VARCHAR(18),"CUST_KIND" VARCHAR(10),"CUST_KIND_NAME" VARCHAR(
> 120),"CUST_TYPE" VARCHAR(6),"CUST_TYPE1" VARCHAR(6),"CUST_TYPE2" VARCHAR(6
> ),"CUST_TYPE3" VARCHAR(6),"CUST_TYPE4" VARCHAR(6),"CUST_TYPE5" VARCHAR(6),
> "AREA_TYPE" VARCHAR(2),"IS_SEFL_CUST" VARCHAR(1),"IS_FUNC_CUST" VARCHAR(1
> ),"MANAGER_BIRTHDAY" VARCHAR(8),"CELEBRATE_DATE" VARCHAR(8),
> "NATION_CUST_CODE" VARCHAR(30),"LONGITUDE" DECIMAL,"LATITUDE" DECIMAL,
> "AGENT_MESSAGE" VARCHAR(400),"NOTE" VARCHAR(1000),"UPDATE_TIME" TIMESTAMP,
> "INVTY_ID" VARCHAR(30),"STEP_ID" VARCHAR(20),"INV_CUST_ID" VARCHAR(60),
> "INV_UNIT_NAME" VARCHAR(200),"ACCOUNT" VARCHAR(200),"BANK" VARCHAR(100),
> "TAX_ACCOUNT" VARCHAR(120),"OTHER_ORDER_WAY" VARCHAR(32),"SALE_AVG"
> DECIMAL,"ITEM_ORD" DECIMAL,"QTY_SOLD" DECIMAL,"AMT_SOLD" DECIMAL,
> "MANAGER_TEL" VARCHAR(80),"IS_SALE_LARGE" VARCHAR(1),"TAX_TEL" VARCHAR(60
> ),"TAX_ADDR" VARCHAR(400),"IS_TOR_TAX" VARCHAR(1),"CUST_TYPE6" VARCHAR(6),
> "CUST_TYPE7" VARCHAR(6),"CUST_TYPE8" VARCHAR(124),"CUST_TYPE9" VARCHAR(60
> ),"CUST_TYPE10" VARCHAR(200),"CANT_ID" VARCHAR(20),"IS_ONLINE_PAY"
> VARCHAR(1),"IS_RAIL_CUST" VARCHAR(1),"CUST_SEG" VARCHAR(30),"QTY_MULTIPLE"
> VARCHAR(20),"TAX_ADRR" VARCHAR(200),"COLLECT_STAFF_ID" VARCHAR(30),
> "ITEM_HEIGHT" DECIMAL,"AREA_ID" VARCHAR(30),"BASE_TYPE_EXT" VARCHAR(30),
> "AREA_TYPE_EXT" VARCHAR(30),"WORK_PORT_EXT" VARCHAR(30),"CUST_SEG_EXT"
> VARCHAR(30),"IS_CIGAR_CUST" VARCHAR(1))
> [2020-05-23T21:43:58,411][DEBUG][main][IgniteH2Indexing] Creating cache
> index [cacheId=1684722246, idxName=_key_PK]
> [2020-05-23T21:43:59,081][DEBUG][main][IgniteH2Indexing] Creating cache
> index [cacheId=1684722246, idxName=IDX_CO_CUST_SALE_CENTER_ID]
> [2020-05-23T21:43:59,088][DEBUG][main][IgniteH2Indexing] Creating cache
> index [cacheId=1684722246, idxName=IDX_CO_CUST_STATUS]
> [2020-05-23T21:43:59,099][DEBUG][main][IgniteH2Indexing] Creating cache
> index [cacheId=1684722246, idxName=IDX_CO_CUST_SALE_DEPT_ID]
> [2020-05-23T21:43:59,109][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CUST, id=1684722246, dataRegionName=default,
> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>
> It can be seen from the log that all tables and indexes need to be rebuilt
> during node startup? Just don't load data?
>
> 在 2020/5/21 下午8:48, Ilya Kasnacheev 写道:
>
> Hello!
>
> 1. I guess that WAL is read.
> 2. Unfortunately we do not have truly graceful exit as far as my
> understanding goes.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 19 мая 2020 г. в 10:22, 38797715 <38...@qq.com>:
>
>> Hi,
>>
>> the following log message:
>>
>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>> mvcc=false]
>>
>> I have the following questions:
>>
>> 1.What has been done in the startup cache in recovery mode?
>>
>> 2.After testing, if the node stops normally (non abnormal shutdown), the
>> recovery process will also be performed during startup. Why?
>> 在 2020/5/18 下午9:58, Ilya Kasnacheev 写道:
>>
>> Hello!
>>
>> Direct IO module is experimental and should not be used unless
>> performance is tested first, in your specific use case.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 18 мая 2020 г. в 16:47, 38797715 <38...@qq.com>:
>>
>>> Hi,
>>>
>>> If direct IO is disabled, the startup speed will be doubled, including
>>> some other tests. I find that direct IO has a great impact on the read
>>> performance.
>>> 在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>>>
>>> Can you share full logs from all nodes?
>>>
>>> вт, 12 мая 2020 г. в 18:24, 38797715 <38...@qq.com>:
>>>
>>>> Hi Evgenii,
>>>>
>>>> The storage used is not SSD.
>>>>
>>>> We will use different versions of ignite for further testing, such as
>>>> ignite2.8.
>>>> Ignite is configured as follows:
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <beans xmlns="http://www.springframework.org/schema/beans"
>>>> <http://www.springframework.org/schema/beans>
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> <http://www.w3.org/2001/XMLSchema-instance> xsi:schemaLocation="
>>>> http://www.springframework.org/schema/beans
>>>> http://www.springframework.org/schema/beans/spring-beans.xsd">
>>>> <bean id="ignite.cfg" class=
>>>> "org.apache.ignite.configuration.IgniteConfiguration">
>>>> <property name="peerClassLoadingEnabled" value="true"/>
>>>> <property name="consistentId" value="20"/>
>>>> <property name="failureDetectionTimeout" value="120000"/>
>>>> <property name="workDirectory" value="/appdata/ignite"/>
>>>> <property name="rebalanceBatchSize" value="#{2 * 1024 * 1024}"/>
>>>> <property name="rebalanceThrottle" value="100"/>
>>>> <property name="rebalanceThreadPoolSize" value="4"/>
>>>> <property name="gridLogger">
>>>> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>>> <constructor-arg type="java.lang.String" value=
>>>> "config/ignite-log4j2.xml"/>
>>>> </bean>
>>>> </property>
>>>> <property name="cacheConfiguration">
>>>> <list>
>>>> <bean id="partitioned-cache-template" abstract="true" class=
>>>> "org.apache.ignite.configuration.CacheConfiguration">
>>>> <property name="name" value="cache-partitioned*"/>
>>>> <property name="cacheMode" value="PARTITIONED" />
>>>> <property name="backups" value="1" />
>>>> <property name="queryParallelism" value="16"/>
>>>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>>>> </bean>
>>>> <bean id="replicated-cache-template" abstract="true" class=
>>>> "org.apache.ignite.configuration.CacheConfiguration">
>>>> <property name="name" value="cache-replicated*"/>
>>>> <property name="cacheMode" value="REPLICATED" />
>>>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>>>> </bean>
>>>> </list>
>>>> </property>
>>>> <!-- Enabling Apache Ignite Persistent Store. -->
>>>> <property name="dataStorageConfiguration">
>>>> <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
>>>> <property name="defaultDataRegionConfiguration">
>>>> <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
>>>> <property name="persistenceEnabled" value="true"/>
>>>> <property name="maxSize" value="#{200L * 1024 * 1024 * 1024}"/>
>>>> </bean>
>>>> </property>
>>>> </bean>
>>>> </property>
>>>> </bean>
>>>> </beans>
>>>> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>>>
>>>> Hi,
>>>>
>>>> Can you share full logs and configuration? What disk so you use?
>>>>
>>>> Evgenii
>>>>
>>>> вт, 12 мая 2020 г. в 06:49, 38797715 <38...@qq.com>:
>>>>
>>>>> Among them:
>>>>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>>>
>>>>> Ignite sys cache: ~ 27 minutes
>>>>>
>>>>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>>>>
>>>>>
>>>>> 在 2020/5/12 下午9:08, 38797715 写道:
>>>>>
>>>>> Hi community,
>>>>>
>>>>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>>>>> We have 7 tables to test, and the data volume is
>>>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
>>>>> partitioned(backup = 1)
>>>>>
>>>>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>>>>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>>>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>>>>> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
>>>>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>>>>
>>>>> Today, one of the servers was restarted(kill and then start ignite.sh)
>>>>> for some reason, but the node took 1.5 hours to start, which was much
>>>>> longer than expected.
>>>>>
>>>>> After analyzing the log, the following information is found:
>>>>> [2020-05-12T17:00:05,138][INFO
>>>>> ][main][GridCacheDatabaseSharedManager] Found last checkpoint marker [cpId=
>>>>> 7a0564f2-43e5-400b-9439-746fc68a6ccb, pos=FileWALPointer [idx=10511,
>>>>> fileOff=51348888, len=61193]]
>>>>> [2020-05-12T17:00:05,151][INFO
>>>>> ][main][GridCacheDatabaseSharedManager] Binary memory state restored at
>>>>> node startup [restoredPtr=FileWALPointer [idx=10511, fileOff=51410110,
>>>>> len=0]]
>>>>> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager]
>>>>> Resuming logging to WAL segment [file=/appdata/ignite/db/wal/24/
>>>>> 0000000000000001.wal, offset=51410110, ver=2]
>>>>> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
>>>>> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
>>>>> checkpointBuffer=2.0 GiB]
>>>>> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=CO_CO_NEW, id=-189779360,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
>>>>> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=
>>>>> 2147483647, mvcc=false]
>>>>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=PLM_ITEM, id=-1283854143,
>>>>> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, backups=
>>>>> 2147483647, mvcc=false]
>>>>> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=CO_CO, id=64322847,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started
>>>>> cache in recovery mode [name=CO_CUST, id=1684722246,
>>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>>> mvcc=false]
>>>>> [2020-05-12T18:22:54,892][INFO
>>>>> ][main][GridCacheDatabaseSharedManager] Binary recovery performed in
>>>>> 4970233 ms.
>>>>>
>>>>> Among them, binary recovery took 4970 seconds.
>>>>>
>>>>> Our question is:
>>>>>
>>>>> 1.Why is the start time so long?
>>>>>
>>>>> 2.Is the current state of ignite, with the growth of single node data
>>>>> volume, the restart time will be longer and longer?
>>>>>
>>>>> 3.Do have any suggestions for optimizing the restart time?
>>>>>
>>>>>

Re: Binary recovery for a very long time

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

1. I guess that WAL is read.
2. Unfortunately we do not have truly graceful exit as far as my
understanding goes.

Regards,
-- 
Ilya Kasnacheev


вт, 19 мая 2020 г. в 10:22, 38797715 <38...@qq.com>:

> Hi,
>
> the following log message:
>
> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
>
> I have the following questions:
>
> 1.What has been done in the startup cache in recovery mode?
>
> 2.After testing, if the node stops normally (non abnormal shutdown), the
> recovery process will also be performed during startup. Why?
> 在 2020/5/18 下午9:58, Ilya Kasnacheev 写道:
>
> Hello!
>
> Direct IO module is experimental and should not be used unless performance
> is tested first, in your specific use case.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 18 мая 2020 г. в 16:47, 38797715 <38...@qq.com>:
>
>> Hi,
>>
>> If direct IO is disabled, the startup speed will be doubled, including
>> some other tests. I find that direct IO has a great impact on the read
>> performance.
>> 在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>>
>> Can you share full logs from all nodes?
>>
>> вт, 12 мая 2020 г. в 18:24, 38797715 <38...@qq.com>:
>>
>>> Hi Evgenii,
>>>
>>> The storage used is not SSD.
>>>
>>> We will use different versions of ignite for further testing, such as
>>> ignite2.8.
>>> Ignite is configured as follows:
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <beans xmlns="http://www.springframework.org/schema/beans"
>>> <http://www.springframework.org/schema/beans>
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>> <http://www.w3.org/2001/XMLSchema-instance> xsi:schemaLocation="
>>> http://www.springframework.org/schema/beans
>>> http://www.springframework.org/schema/beans/spring-beans.xsd">
>>> <bean id="ignite.cfg" class=
>>> "org.apache.ignite.configuration.IgniteConfiguration">
>>> <property name="peerClassLoadingEnabled" value="true"/>
>>> <property name="consistentId" value="20"/>
>>> <property name="failureDetectionTimeout" value="120000"/>
>>> <property name="workDirectory" value="/appdata/ignite"/>
>>> <property name="rebalanceBatchSize" value="#{2 * 1024 * 1024}"/>
>>> <property name="rebalanceThrottle" value="100"/>
>>> <property name="rebalanceThreadPoolSize" value="4"/>
>>> <property name="gridLogger">
>>> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>> <constructor-arg type="java.lang.String" value=
>>> "config/ignite-log4j2.xml"/>
>>> </bean>
>>> </property>
>>> <property name="cacheConfiguration">
>>> <list>
>>> <bean id="partitioned-cache-template" abstract="true" class=
>>> "org.apache.ignite.configuration.CacheConfiguration">
>>> <property name="name" value="cache-partitioned*"/>
>>> <property name="cacheMode" value="PARTITIONED" />
>>> <property name="backups" value="1" />
>>> <property name="queryParallelism" value="16"/>
>>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>>> </bean>
>>> <bean id="replicated-cache-template" abstract="true" class=
>>> "org.apache.ignite.configuration.CacheConfiguration">
>>> <property name="name" value="cache-replicated*"/>
>>> <property name="cacheMode" value="REPLICATED" />
>>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>>> </bean>
>>> </list>
>>> </property>
>>> <!-- Enabling Apache Ignite Persistent Store. -->
>>> <property name="dataStorageConfiguration">
>>> <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
>>> <property name="defaultDataRegionConfiguration">
>>> <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
>>> <property name="persistenceEnabled" value="true"/>
>>> <property name="maxSize" value="#{200L * 1024 * 1024 * 1024}"/>
>>> </bean>
>>> </property>
>>> </bean>
>>> </property>
>>> </bean>
>>> </beans>
>>> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>>
>>> Hi,
>>>
>>> Can you share full logs and configuration? What disk so you use?
>>>
>>> Evgenii
>>>
>>> вт, 12 мая 2020 г. в 06:49, 38797715 <38...@qq.com>:
>>>
>>>> Among them:
>>>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>>
>>>> Ignite sys cache: ~ 27 minutes
>>>>
>>>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>>>
>>>>
>>>> 在 2020/5/12 下午9:08, 38797715 写道:
>>>>
>>>> Hi community,
>>>>
>>>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>>>> We have 7 tables to test, and the data volume is
>>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
>>>> partitioned(backup = 1)
>>>>
>>>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>>>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>>>> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
>>>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>>>
>>>> Today, one of the servers was restarted(kill and then start ignite.sh)
>>>> for some reason, but the node took 1.5 hours to start, which was much
>>>> longer than expected.
>>>>
>>>> After analyzing the log, the following information is found:
>>>> [2020-05-12T17:00:05,138][INFO ][main][GridCacheDatabaseSharedManager]
>>>> Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>>> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
>>>> [2020-05-12T17:00:05,151][INFO ][main][GridCacheDatabaseSharedManager]
>>>> Binary memory state restored at node startup [restoredPtr=FileWALPointer
>>>> [idx=10511, fileOff=51410110, len=0]]
>>>> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager]
>>>> Resuming logging to WAL segment [file=/appdata/ignite/db/wal/24/
>>>> 0000000000000001.wal, offset=51410110, ver=2]
>>>> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
>>>> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
>>>> checkpointBuffer=2.0 GiB]
>>>> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO_NEW, id=-189779360,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
>>>> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=
>>>> 2147483647, mvcc=false]
>>>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=PLM_ITEM, id=-1283854143,
>>>> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, backups=
>>>> 2147483647, mvcc=false]
>>>> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CO, id=64322847,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started
>>>> cache in recovery mode [name=CO_CUST, id=1684722246,
>>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>>> mvcc=false]
>>>> [2020-05-12T18:22:54,892][INFO ][main][GridCacheDatabaseSharedManager]
>>>> Binary recovery performed in 4970233 ms.
>>>>
>>>> Among them, binary recovery took 4970 seconds.
>>>>
>>>> Our question is:
>>>>
>>>> 1.Why is the start time so long?
>>>>
>>>> 2.Is the current state of ignite, with the growth of single node data
>>>> volume, the restart time will be longer and longer?
>>>>
>>>> 3.Do have any suggestions for optimizing the restart time?
>>>>
>>>>

Re: Binary recovery for a very long time

Posted by 38797715 <38...@qq.com>.

Hi,

the following log message:

[2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache 
in recovery mode [name=CO_CO_LINE_NEW, id=1742991829, 
dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, 
mvcc=false]

I have the following questions:

1.What has been done in the startup cache in recovery mode?

2.After testing, if the node stops normally (non abnormal shutdown), the 
recovery process will also be performed during startup. Why?

在 2020/5/18 下午9:58, Ilya Kasnacheev 写道:
> Hello!
>
> Direct IO module is experimental and should not be used unless 
> performance is tested first, in your specific use case.
>
> Regards,
> -- 
> Ilya Kasnacheev
>
>
> пн, 18 мая 2020 г. в 16:47, 38797715 <38797715@qq.com 
> <ma...@qq.com>>:
>
>     Hi,
>
>     If direct IO is disabled, the startup speed will be doubled,
>     including some other tests. I find that direct IO has a great
>     impact on the read performance.
>
>     在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>>     Can you share full logs from all nodes?
>>
>>     вт, 12 мая 2020 г. в 18:24, 38797715 <38797715@qq.com
>>     <ma...@qq.com>>:
>>
>>         Hi Evgenii,
>>
>>         The storage used is not SSD.
>>
>>         We will use different versions of ignite for further testing,
>>         such as ignite2.8.
>>         Ignite is configured as follows:
>>
>>         <?xmlversion="1.0"encoding="UTF-8"?>
>>         <beansxmlns="http://www.springframework.org/schema/beans"
>>         <http://www.springframework.org/schema/beans>
>>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>         <http://www.w3.org/2001/XMLSchema-instance>xsi:schemaLocation="http://www.springframework.org/schema/beans
>>         http://www.springframework.org/schema/beans/spring-beans.xsd">
>>         <beanid="ignite.cfg"class="org.apache.ignite.configuration.IgniteConfiguration">
>>         <propertyname="peerClassLoadingEnabled"value="true"/>
>>         <propertyname="consistentId"value="20"/>
>>         <propertyname="failureDetectionTimeout"value="120000"/>
>>         <propertyname="workDirectory"value="/appdata/ignite"/>
>>         <propertyname="rebalanceBatchSize"value="#{2 * 1024 * 1024}"/>
>>         <propertyname="rebalanceThrottle"value="100"/>
>>         <propertyname="rebalanceThreadPoolSize"value="4"/>
>>         <propertyname="gridLogger">
>>         <beanclass="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>         <constructor-argtype="java.lang.String"value="config/ignite-log4j2.xml"/>
>>         </bean>
>>         </property>
>>         <propertyname="cacheConfiguration">
>>         <list>
>>         <beanid="partitioned-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
>>         <propertyname="name"value="cache-partitioned*"/>
>>         <propertyname="cacheMode"value="PARTITIONED"/>
>>         <propertyname="backups"value="1"/>
>>         <propertyname="queryParallelism"value="16"/>
>>         <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
>>         </bean>
>>         <beanid="replicated-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
>>         <propertyname="name"value="cache-replicated*"/>
>>         <propertyname="cacheMode"value="REPLICATED"/>
>>         <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
>>         </bean>
>>         </list>
>>         </property>
>>         <!-- Enabling Apache Ignite Persistent Store. -->
>>         <propertyname="dataStorageConfiguration">
>>         <beanclass="org.apache.ignite.configuration.DataStorageConfiguration">
>>         <propertyname="defaultDataRegionConfiguration">
>>         <beanclass="org.apache.ignite.configuration.DataRegionConfiguration">
>>         <propertyname="persistenceEnabled"value="true"/>
>>         <propertyname="maxSize"value="#{200L * 1024 * 1024 * 1024}"/>
>>         </bean>
>>         </property>
>>         </bean>
>>         </property>
>>         </bean>
>>         </beans>
>>         在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>>         Hi,
>>>
>>>         Can you share full logs and configuration? What disk so you use?
>>>
>>>         Evgenii
>>>
>>>         вт, 12 мая 2020 г. в 06:49, 38797715 <38797715@qq.com
>>>         <ma...@qq.com>>:
>>>
>>>             Among them:
>>>             CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>
>>>             Ignite sys cache: ~ 27 minutes
>>>
>>>             PLM_ITEM:~3 minutes(repicated,1.9K)
>>>
>>>
>>>             在 2020/5/12 下午9:08, 38797715 写道:
>>>>
>>>>             Hi community,
>>>>
>>>>             We have 5 servers, 16 cores, 256g memory, and 200g
>>>>             off-heap memory.
>>>>             We have 7 tables to test, and the data volume is
>>>>             respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
>>>>             are partitioned(backup = 1)
>>>>
>>>>             VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch
>>>>             -XX:+UseG1GC -XX:+ScavengeBeforeFullGC
>>>>             -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>>>             -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
>>>>             -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>>>>             -XX:GCLogFileSize=100M -Xloggc:/data/gc/logs/gclog.txt
>>>>             -Djava.net.preferIPv4Stack=true
>>>>             -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>>>
>>>>             Today, one of the servers was restarted(kill and then
>>>>             start ignite.sh) for some reason, but the node took 1.5
>>>>             hours to start, which was much longer than expected.
>>>>
>>>>             After analyzing the log, the following information is
>>>>             found:
>>>>
>>>>             [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
>>>>             Found last checkpoint marker
>>>>             [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>>>             pos=FileWALPointer [idx=10511, fileOff=51348888,
>>>>             len=61193]]
>>>>             [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
>>>>             Binary memory state restored at node startup
>>>>             [restoredPtr=FileWALPointer [idx=10511,
>>>>             fileOff=51410110, len=0]]
>>>>             [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager]
>>>>             Resuming logging to WAL segment
>>>>             [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
>>>>             offset=51410110, ver=2]
>>>>             [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl]
>>>>             Started page memory [memoryAllocated=200.0GiB,
>>>>             pages=50821088, tableSize=3.9GiB, checkpointBuffer=2.0GiB]
>>>>             [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor]
>>>>             Started cache in recovery mode [name=CO_CO_NEW,
>>>>             id=-189779360, dataRegionName=default,
>>>>             mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>>>>             [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor]
>>>>             Started cache in recovery mode [name=CO_CO_LINE,
>>>>             id=-1588248812, dataRegionName=default,
>>>>             mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>>>>             [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor]
>>>>             Started cache in recovery mode [name=ignite-sys-cache,
>>>>             id=-2100569601, dataRegionName=sysMemPlc,
>>>>             mode=REPLICATED, atomicity=TRANSACTIONAL,
>>>>             backups=2147483647, mvcc=false]
>>>>             [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor]
>>>>             Started cache in recovery mode [name=CO_CO_LINE_NEW,
>>>>             id=1742991829, dataRegionName=default,
>>>>             mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>>>>             [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor]
>>>>             Started cache in recovery mode [name=PI_COM_DAY,
>>>>             id=-1904194728, dataRegionName=default,
>>>>             mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>>>>             [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor]
>>>>             Started cache in recovery mode [name=PLM_ITEM,
>>>>             id=-1283854143, dataRegionName=default,
>>>>             mode=REPLICATED, atomicity=ATOMIC, backups=2147483647,
>>>>             mvcc=false]
>>>>             [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor]
>>>>             Started cache in recovery mode [name=CO_CO,
>>>>             id=64322847, dataRegionName=default, mode=PARTITIONED,
>>>>             atomicity=ATOMIC, backups=1, mvcc=false]
>>>>             [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor]
>>>>             Started cache in recovery mode [name=CO_CUST,
>>>>             id=1684722246, dataRegionName=default,
>>>>             mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>>>>             [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
>>>>             Binary recovery performed in 4970233ms.
>>>>
>>>>             Among them, binary recovery took 4970 seconds.
>>>>
>>>>             Our question is:
>>>>
>>>>             1.Why is the start time so long?
>>>>
>>>>             2.Is the current state of ignite, with the growth of
>>>>             single node data volume, the restart time will be
>>>>             longer and longer?
>>>>
>>>>             3.Do have any suggestions for optimizing the restart time?
>>>>

Re: Binary recovery for a very long time

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Direct IO module is experimental and should not be used unless performance
is tested first, in your specific use case.

Regards,
-- 
Ilya Kasnacheev


пн, 18 мая 2020 г. в 16:47, 38797715 <38...@qq.com>:

> Hi,
>
> If direct IO is disabled, the startup speed will be doubled, including
> some other tests. I find that direct IO has a great impact on the read
> performance.
> 在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
>
> Can you share full logs from all nodes?
>
> вт, 12 мая 2020 г. в 18:24, 38797715 <38...@qq.com>:
>
>> Hi Evgenii,
>>
>> The storage used is not SSD.
>>
>> We will use different versions of ignite for further testing, such as
>> ignite2.8.
>> Ignite is configured as follows:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <beans xmlns="http://www.springframework.org/schema/beans"
>> <http://www.springframework.org/schema/beans>
>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>> <http://www.w3.org/2001/XMLSchema-instance> xsi:schemaLocation="
>> http://www.springframework.org/schema/beans
>> http://www.springframework.org/schema/beans/spring-beans.xsd">
>> <bean id="ignite.cfg" class=
>> "org.apache.ignite.configuration.IgniteConfiguration">
>> <property name="peerClassLoadingEnabled" value="true"/>
>> <property name="consistentId" value="20"/>
>> <property name="failureDetectionTimeout" value="120000"/>
>> <property name="workDirectory" value="/appdata/ignite"/>
>> <property name="rebalanceBatchSize" value="#{2 * 1024 * 1024}"/>
>> <property name="rebalanceThrottle" value="100"/>
>> <property name="rebalanceThreadPoolSize" value="4"/>
>> <property name="gridLogger">
>> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>> <constructor-arg type="java.lang.String" value="config/ignite-log4j2.xml"
>> />
>> </bean>
>> </property>
>> <property name="cacheConfiguration">
>> <list>
>> <bean id="partitioned-cache-template" abstract="true" class=
>> "org.apache.ignite.configuration.CacheConfiguration">
>> <property name="name" value="cache-partitioned*"/>
>> <property name="cacheMode" value="PARTITIONED" />
>> <property name="backups" value="1" />
>> <property name="queryParallelism" value="16"/>
>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>> </bean>
>> <bean id="replicated-cache-template" abstract="true" class=
>> "org.apache.ignite.configuration.CacheConfiguration">
>> <property name="name" value="cache-replicated*"/>
>> <property name="cacheMode" value="REPLICATED" />
>> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
>> </bean>
>> </list>
>> </property>
>> <!-- Enabling Apache Ignite Persistent Store. -->
>> <property name="dataStorageConfiguration">
>> <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
>> <property name="defaultDataRegionConfiguration">
>> <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
>> <property name="persistenceEnabled" value="true"/>
>> <property name="maxSize" value="#{200L * 1024 * 1024 * 1024}"/>
>> </bean>
>> </property>
>> </bean>
>> </property>
>> </bean>
>> </beans>
>> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>
>> Hi,
>>
>> Can you share full logs and configuration? What disk so you use?
>>
>> Evgenii
>>
>> вт, 12 мая 2020 г. в 06:49, 38797715 <38...@qq.com>:
>>
>>> Among them:
>>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>>
>>> Ignite sys cache: ~ 27 minutes
>>>
>>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>>
>>>
>>> 在 2020/5/12 下午9:08, 38797715 写道:
>>>
>>> Hi community,
>>>
>>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>>> We have 7 tables to test, and the data volume is
>>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
>>> partitioned(backup = 1)
>>>
>>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>>> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
>>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>>
>>> Today, one of the servers was restarted(kill and then start ignite.sh)
>>> for some reason, but the node took 1.5 hours to start, which was much
>>> longer than expected.
>>>
>>> After analyzing the log, the following information is found:
>>> [2020-05-12T17:00:05,138][INFO ][main][GridCacheDatabaseSharedManager]
>>> Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
>>> [2020-05-12T17:00:05,151][INFO ][main][GridCacheDatabaseSharedManager]
>>> Binary memory state restored at node startup [restoredPtr=FileWALPointer
>>> [idx=10511, fileOff=51410110, len=0]]
>>> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager]
>>> Resuming logging to WAL segment [file=/appdata/ignite/db/wal/24/
>>> 0000000000000001.wal, offset=51410110, ver=2]
>>> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
>>> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
>>> checkpointBuffer=2.0 GiB]
>>> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=CO_CO_NEW, id=-189779360,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
>>> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=
>>> 2147483647, mvcc=false]
>>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=PLM_ITEM, id=-1283854143,
>>> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, backups=
>>> 2147483647, mvcc=false]
>>> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=CO_CO, id=64322847,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started
>>> cache in recovery mode [name=CO_CUST, id=1684722246,
>>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>>> mvcc=false]
>>> [2020-05-12T18:22:54,892][INFO ][main][GridCacheDatabaseSharedManager]
>>> Binary recovery performed in 4970233 ms.
>>>
>>> Among them, binary recovery took 4970 seconds.
>>>
>>> Our question is:
>>>
>>> 1.Why is the start time so long?
>>>
>>> 2.Is the current state of ignite, with the growth of single node data
>>> volume, the restart time will be longer and longer?
>>>
>>> 3.Do have any suggestions for optimizing the restart time?
>>>
>>>

Re: Binary recovery for a very long time

Posted by 38797715 <38...@qq.com>.

Hi,

If direct IO is disabled, the startup speed will be doubled, including 
some other tests. I find that direct IO has a great impact on the read 
performance.

在 2020/5/14 上午5:16, Evgenii Zhuravlev 写道:
> Can you share full logs from all nodes?
>
> вт, 12 мая 2020 г. в 18:24, 38797715 <38797715@qq.com 
> <ma...@qq.com>>:
>
>     Hi Evgenii,
>
>     The storage used is not SSD.
>
>     We will use different versions of ignite for further testing, such
>     as ignite2.8.
>     Ignite is configured as follows:
>
>     <?xmlversion="1.0"encoding="UTF-8"?>
>     <beansxmlns="http://www.springframework.org/schema/beans"
>     <http://www.springframework.org/schema/beans>
>     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>     <http://www.w3.org/2001/XMLSchema-instance>xsi:schemaLocation="http://www.springframework.org/schema/beans
>     http://www.springframework.org/schema/beans/spring-beans.xsd">
>     <beanid="ignite.cfg"class="org.apache.ignite.configuration.IgniteConfiguration">
>     <propertyname="peerClassLoadingEnabled"value="true"/>
>     <propertyname="consistentId"value="20"/>
>     <propertyname="failureDetectionTimeout"value="120000"/>
>     <propertyname="workDirectory"value="/appdata/ignite"/>
>     <propertyname="rebalanceBatchSize"value="#{2 * 1024 * 1024}"/>
>     <propertyname="rebalanceThrottle"value="100"/>
>     <propertyname="rebalanceThreadPoolSize"value="4"/>
>     <propertyname="gridLogger">
>     <beanclass="org.apache.ignite.logger.log4j2.Log4J2Logger">
>     <constructor-argtype="java.lang.String"value="config/ignite-log4j2.xml"/>
>     </bean>
>     </property>
>     <propertyname="cacheConfiguration">
>     <list>
>     <beanid="partitioned-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
>     <propertyname="name"value="cache-partitioned*"/>
>     <propertyname="cacheMode"value="PARTITIONED"/>
>     <propertyname="backups"value="1"/>
>     <propertyname="queryParallelism"value="16"/>
>     <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
>     </bean>
>     <beanid="replicated-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
>     <propertyname="name"value="cache-replicated*"/>
>     <propertyname="cacheMode"value="REPLICATED"/>
>     <propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
>     </bean>
>     </list>
>     </property>
>     <!-- Enabling Apache Ignite Persistent Store. -->
>     <propertyname="dataStorageConfiguration">
>     <beanclass="org.apache.ignite.configuration.DataStorageConfiguration">
>     <propertyname="defaultDataRegionConfiguration">
>     <beanclass="org.apache.ignite.configuration.DataRegionConfiguration">
>     <propertyname="persistenceEnabled"value="true"/>
>     <propertyname="maxSize"value="#{200L * 1024 * 1024 * 1024}"/>
>     </bean>
>     </property>
>     </bean>
>     </property>
>     </bean>
>     </beans>
>     在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>>     Hi,
>>
>>     Can you share full logs and configuration? What disk so you use?
>>
>>     Evgenii
>>
>>     вт, 12 мая 2020 г. в 06:49, 38797715 <38797715@qq.com
>>     <ma...@qq.com>>:
>>
>>         Among them:
>>         CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>
>>         Ignite sys cache: ~ 27 minutes
>>
>>         PLM_ITEM:~3 minutes(repicated,1.9K)
>>
>>
>>         在 2020/5/12 下午9:08, 38797715 写道:
>>>
>>>         Hi community,
>>>
>>>         We have 5 servers, 16 cores, 256g memory, and 200g off-heap
>>>         memory.
>>>         We have 7 tables to test, and the data volume is
>>>         respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
>>>         are partitioned(backup = 1)
>>>
>>>         VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch
>>>         -XX:+UseG1GC -XX:+ScavengeBeforeFullGC
>>>         -XX:+DisableExplicitGC -XX:+PrintGCDetails
>>>         -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
>>>         -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>>>         -XX:GCLogFileSize=100M -Xloggc:/data/gc/logs/gclog.txt
>>>         -Djava.net.preferIPv4Stack=true -XX:MaxDirectMemorySize=256M
>>>         -XX:+PrintAdaptiveSizePolicy
>>>
>>>         Today, one of the servers was restarted(kill and then start
>>>         ignite.sh) for some reason, but the node took 1.5 hours to
>>>         start, which was much longer than expected.
>>>
>>>         After analyzing the log, the following information is found:
>>>
>>>         [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
>>>         Found last checkpoint marker
>>>         [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>>>         pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
>>>         [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
>>>         Binary memory state restored at node startup
>>>         [restoredPtr=FileWALPointer [idx=10511, fileOff=51410110,
>>>         len=0]]
>>>         [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager]
>>>         Resuming logging to WAL segment
>>>         [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
>>>         offset=51410110, ver=2]
>>>         [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl]
>>>         Started page memory [memoryAllocated=200.0GiB,
>>>         pages=50821088, tableSize=3.9GiB, checkpointBuffer=2.0GiB]
>>>         [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor]
>>>         Started cache in recovery mode [name=CO_CO_NEW,
>>>         id=-189779360, dataRegionName=default, mode=PARTITIONED,
>>>         atomicity=ATOMIC, backups=1, mvcc=false]
>>>         [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor]
>>>         Started cache in recovery mode [name=CO_CO_LINE,
>>>         id=-1588248812, dataRegionName=default, mode=PARTITIONED,
>>>         atomicity=ATOMIC, backups=1, mvcc=false]
>>>         [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor]
>>>         Started cache in recovery mode [name=ignite-sys-cache,
>>>         id=-2100569601, dataRegionName=sysMemPlc, mode=REPLICATED,
>>>         atomicity=TRANSACTIONAL, backups=2147483647, mvcc=false]
>>>         [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor]
>>>         Started cache in recovery mode [name=CO_CO_LINE_NEW,
>>>         id=1742991829, dataRegionName=default, mode=PARTITIONED,
>>>         atomicity=ATOMIC, backups=1, mvcc=false]
>>>         [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor]
>>>         Started cache in recovery mode [name=PI_COM_DAY,
>>>         id=-1904194728, dataRegionName=default, mode=PARTITIONED,
>>>         atomicity=ATOMIC, backups=1, mvcc=false]
>>>         [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor]
>>>         Started cache in recovery mode [name=PLM_ITEM,
>>>         id=-1283854143, dataRegionName=default, mode=REPLICATED,
>>>         atomicity=ATOMIC, backups=2147483647, mvcc=false]
>>>         [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor]
>>>         Started cache in recovery mode [name=CO_CO, id=64322847,
>>>         dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>>>         backups=1, mvcc=false]
>>>         [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor]
>>>         Started cache in recovery mode [name=CO_CUST, id=1684722246,
>>>         dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>>>         backups=1, mvcc=false]
>>>         [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
>>>         Binary recovery performed in 4970233ms.
>>>
>>>         Among them, binary recovery took 4970 seconds.
>>>
>>>         Our question is:
>>>
>>>         1.Why is the start time so long?
>>>
>>>         2.Is the current state of ignite, with the growth of single
>>>         node data volume, the restart time will be longer and longer?
>>>
>>>         3.Do have any suggestions for optimizing the restart time?
>>>

Re: Binary recovery for a very long time

Posted by Evgenii Zhuravlev <e....@gmail.com>.

Can you share full logs from all nodes?

вт, 12 мая 2020 г. в 18:24, 38797715 <38...@qq.com>:

> Hi Evgenii,
>
> The storage used is not SSD.
>
> We will use different versions of ignite for further testing, such as
> ignite2.8.
> Ignite is configured as follows:
> <?xml version="1.0" encoding="UTF-8"?>
> <beans xmlns="http://www.springframework.org/schema/beans"
> <http://www.springframework.org/schema/beans>
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> <http://www.w3.org/2001/XMLSchema-instance> xsi:schemaLocation="
> http://www.springframework.org/schema/beans
> http://www.springframework.org/schema/beans/spring-beans.xsd">
> <bean id="ignite.cfg" class=
> "org.apache.ignite.configuration.IgniteConfiguration">
> <property name="peerClassLoadingEnabled" value="true"/>
> <property name="consistentId" value="20"/>
> <property name="failureDetectionTimeout" value="120000"/>
> <property name="workDirectory" value="/appdata/ignite"/>
> <property name="rebalanceBatchSize" value="#{2 * 1024 * 1024}"/>
> <property name="rebalanceThrottle" value="100"/>
> <property name="rebalanceThreadPoolSize" value="4"/>
> <property name="gridLogger">
> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
> <constructor-arg type="java.lang.String" value="config/ignite-log4j2.xml"
> />
> </bean>
> </property>
> <property name="cacheConfiguration">
> <list>
> <bean id="partitioned-cache-template" abstract="true" class=
> "org.apache.ignite.configuration.CacheConfiguration">
> <property name="name" value="cache-partitioned*"/>
> <property name="cacheMode" value="PARTITIONED" />
> <property name="backups" value="1" />
> <property name="queryParallelism" value="16"/>
> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
> </bean>
> <bean id="replicated-cache-template" abstract="true" class=
> "org.apache.ignite.configuration.CacheConfiguration">
> <property name="name" value="cache-replicated*"/>
> <property name="cacheMode" value="REPLICATED" />
> <property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>
> </bean>
> </list>
> </property>
> <!-- Enabling Apache Ignite Persistent Store. -->
> <property name="dataStorageConfiguration">
> <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
> <property name="defaultDataRegionConfiguration">
> <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
> <property name="persistenceEnabled" value="true"/>
> <property name="maxSize" value="#{200L * 1024 * 1024 * 1024}"/>
> </bean>
> </property>
> </bean>
> </property>
> </bean>
> </beans>
> 在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
>
> Hi,
>
> Can you share full logs and configuration? What disk so you use?
>
> Evgenii
>
> вт, 12 мая 2020 г. в 06:49, 38797715 <38...@qq.com>:
>
>> Among them:
>> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>>
>> Ignite sys cache: ~ 27 minutes
>>
>> PLM_ITEM:~3 minutes(repicated,1.9K)
>>
>>
>> 在 2020/5/12 下午9:08, 38797715 写道:
>>
>> Hi community,
>>
>> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>> We have 7 tables to test, and the data volume is
>> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
>> partitioned(backup = 1)
>>
>> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
>> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
>> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>>
>> Today, one of the servers was restarted(kill and then start ignite.sh)
>> for some reason, but the node took 1.5 hours to start, which was much
>> longer than expected.
>>
>> After analyzing the log, the following information is found:
>> [2020-05-12T17:00:05,138][INFO ][main][GridCacheDatabaseSharedManager]
>> Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
>> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
>> [2020-05-12T17:00:05,151][INFO ][main][GridCacheDatabaseSharedManager]
>> Binary memory state restored at node startup [restoredPtr=FileWALPointer
>> [idx=10511, fileOff=51410110, len=0]]
>> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager]
>> Resuming logging to WAL segment [file=/appdata/ignite/db/wal/24/
>> 0000000000000001.wal, offset=51410110, ver=2]
>> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
>> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
>> checkpointBuffer=2.0 GiB]
>> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CO_NEW, id=-189779360, dataRegionName=default,
>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CO_LINE, id=-1588248812,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>> mvcc=false]
>> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=ignite-sys-cache, id=-2100569601,
>> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=
>> 2147483647, mvcc=false]
>> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>> mvcc=false]
>> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=PI_COM_DAY, id=-1904194728,
>> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
>> mvcc=false]
>> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=PLM_ITEM, id=-1283854143, dataRegionName=default,
>> mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
>> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CO, id=64322847, dataRegionName=default,
>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started cache
>> in recovery mode [name=CO_CUST, id=1684722246, dataRegionName=default,
>> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
>> [2020-05-12T18:22:54,892][INFO ][main][GridCacheDatabaseSharedManager]
>> Binary recovery performed in 4970233 ms.
>>
>> Among them, binary recovery took 4970 seconds.
>>
>> Our question is:
>>
>> 1.Why is the start time so long?
>>
>> 2.Is the current state of ignite, with the growth of single node data
>> volume, the restart time will be longer and longer?
>>
>> 3.Do have any suggestions for optimizing the restart time?
>>
>>

Re: Binary recovery for a very long time

Posted by 38797715 <38...@qq.com>.

Hi Evgenii,

The storage used is not SSD.

We will use different versions of ignite for further testing, such as 
ignite2.8.
Ignite is configured as follows:

<?xmlversion="1.0"encoding="UTF-8"?>
<beansxmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<beanid="ignite.cfg"class="org.apache.ignite.configuration.IgniteConfiguration">
<propertyname="peerClassLoadingEnabled"value="true"/>
<propertyname="consistentId"value="20"/>
<propertyname="failureDetectionTimeout"value="120000"/>
<propertyname="workDirectory"value="/appdata/ignite"/>
<propertyname="rebalanceBatchSize"value="#{2 * 1024 * 1024}"/>
<propertyname="rebalanceThrottle"value="100"/>
<propertyname="rebalanceThreadPoolSize"value="4"/>
<propertyname="gridLogger">
<beanclass="org.apache.ignite.logger.log4j2.Log4J2Logger">
<constructor-argtype="java.lang.String"value="config/ignite-log4j2.xml"/>
</bean>
</property>
<propertyname="cacheConfiguration">
<list>
<beanid="partitioned-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
<propertyname="name"value="cache-partitioned*"/>
<propertyname="cacheMode"value="PARTITIONED"/>
<propertyname="backups"value="1"/>
<propertyname="queryParallelism"value="16"/>
<propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
</bean>
<beanid="replicated-cache-template"abstract="true"class="org.apache.ignite.configuration.CacheConfiguration">
<propertyname="name"value="cache-replicated*"/>
<propertyname="cacheMode"value="REPLICATED"/>
<propertyname="partitionLossPolicy"value="READ_ONLY_SAFE"/>
</bean>
</list>
</property>
<!-- Enabling Apache Ignite Persistent Store. -->
<propertyname="dataStorageConfiguration">
<beanclass="org.apache.ignite.configuration.DataStorageConfiguration">
<propertyname="defaultDataRegionConfiguration">
<beanclass="org.apache.ignite.configuration.DataRegionConfiguration">
<propertyname="persistenceEnabled"value="true"/>
<propertyname="maxSize"value="#{200L * 1024 * 1024 * 1024}"/>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
在 2020/5/13 上午4:45, Evgenii Zhuravlev 写道:
> Hi,
>
> Can you share full logs and configuration? What disk so you use?
>
> Evgenii
>
> вт, 12 мая 2020 г. в 06:49, 38797715 <38797715@qq.com 
> <ma...@qq.com>>:
>
>     Among them:
>     CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>
>     Ignite sys cache: ~ 27 minutes
>
>     PLM_ITEM:~3 minutes(repicated,1.9K)
>
>
>     在 2020/5/12 下午9:08, 38797715 写道:
>>
>>     Hi community,
>>
>>     We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
>>     We have 7 tables to test, and the data volume is
>>     respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others
>>     are partitioned(backup = 1)
>>
>>     VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
>>     -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC
>>     -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
>>     -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>>     -XX:GCLogFileSize=100M -Xloggc:/data/gc/logs/gclog.txt
>>     -Djava.net.preferIPv4Stack=true -XX:MaxDirectMemorySize=256M
>>     -XX:+PrintAdaptiveSizePolicy
>>
>>     Today, one of the servers was restarted(kill and then start
>>     ignite.sh) for some reason, but the node took 1.5 hours to start,
>>     which was much longer than expected.
>>
>>     After analyzing the log, the following information is found:
>>
>>     [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager]
>>     Found last checkpoint marker
>>     [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb, pos=FileWALPointer
>>     [idx=10511, fileOff=51348888, len=61193]]
>>     [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager]
>>     Binary memory state restored at node startup
>>     [restoredPtr=FileWALPointer [idx=10511, fileOff=51410110, len=0]]
>>     [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager]
>>     Resuming logging to WAL segment
>>     [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
>>     offset=51410110, ver=2]
>>     [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl] Started
>>     page memory [memoryAllocated=200.0GiB, pages=50821088,
>>     tableSize=3.9GiB, checkpointBuffer=2.0GiB]
>>     [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor] Started
>>     cache in recovery mode [name=CO_CO_NEW, id=-189779360,
>>     dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>>     backups=1, mvcc=false]
>>     [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor] Started
>>     cache in recovery mode [name=CO_CO_LINE, id=-1588248812,
>>     dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>>     backups=1, mvcc=false]
>>     [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor] Started
>>     cache in recovery mode [name=ignite-sys-cache, id=-2100569601,
>>     dataRegionName=sysMemPlc, mode=REPLICATED,
>>     atomicity=TRANSACTIONAL, backups=2147483647, mvcc=false]
>>     [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor] Started
>>     cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
>>     dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>>     backups=1, mvcc=false]
>>     [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor] Started
>>     cache in recovery mode [name=PI_COM_DAY, id=-1904194728,
>>     dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>>     backups=1, mvcc=false]
>>     [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor] Started
>>     cache in recovery mode [name=PLM_ITEM, id=-1283854143,
>>     dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC,
>>     backups=2147483647, mvcc=false]
>>     [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor] Started
>>     cache in recovery mode [name=CO_CO, id=64322847,
>>     dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>>     backups=1, mvcc=false]
>>     [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor] Started
>>     cache in recovery mode [name=CO_CUST, id=1684722246,
>>     dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC,
>>     backups=1, mvcc=false]
>>     [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager]
>>     Binary recovery performed in 4970233ms.
>>
>>     Among them, binary recovery took 4970 seconds.
>>
>>     Our question is:
>>
>>     1.Why is the start time so long?
>>
>>     2.Is the current state of ignite, with the growth of single node
>>     data volume, the restart time will be longer and longer?
>>
>>     3.Do have any suggestions for optimizing the restart time?
>>

Re: Binary recovery for a very long time

Posted by Evgenii Zhuravlev <e....@gmail.com>.

Hi,

Can you share full logs and configuration? What disk so you use?

Evgenii

вт, 12 мая 2020 г. в 06:49, 38797715 <38...@qq.com>:

> Among them:
> CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)
>
> Ignite sys cache: ~ 27 minutes
>
> PLM_ITEM:~3 minutes(repicated,1.9K)
>
>
> 在 2020/5/12 下午9:08, 38797715 写道:
>
> Hi community,
>
> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
> We have 7 tables to test, and the data volume is
> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others are
> partitioned(backup = 1)
>
> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC
> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
> -Xloggc:/data/gc/logs/gclog.txt -Djava.net.preferIPv4Stack=true
> -XX:MaxDirectMemorySize=256M -XX:+PrintAdaptiveSizePolicy
>
> Today, one of the servers was restarted(kill and then start ignite.sh) for
> some reason, but the node took 1.5 hours to start, which was much longer
> than expected.
>
> After analyzing the log, the following information is found:
> [2020-05-12T17:00:05,138][INFO ][main][GridCacheDatabaseSharedManager]
> Found last checkpoint marker [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb,
> pos=FileWALPointer [idx=10511, fileOff=51348888, len=61193]]
> [2020-05-12T17:00:05,151][INFO ][main][GridCacheDatabaseSharedManager]
> Binary memory state restored at node startup [restoredPtr=FileWALPointer
> [idx=10511, fileOff=51410110, len=0]]
> [2020-05-12T17:00:05,152][INFO ][main][FileWriteAheadLogManager] Resuming
> logging to WAL segment [file=/appdata/ignite/db/wal/24/0000000000000001.wal,
> offset=51410110, ver=2]
> [2020-05-12T17:00:06,448][INFO ][main][PageMemoryImpl] Started page
> memory [memoryAllocated=200.0 GiB, pages=50821088, tableSize=3.9 GiB,
> checkpointBuffer=2.0 GiB]
> [2020-05-12T17:02:08,528][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO_NEW, id=-189779360, dataRegionName=default,
> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
> [2020-05-12T17:50:44,341][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO_LINE, id=-1588248812,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T17:50:44,366][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=ignite-sys-cache, id=-2100569601,
> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=
> 2147483647, mvcc=false]
> [2020-05-12T18:17:57,071][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO_LINE_NEW, id=1742991829,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T18:19:54,910][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=PI_COM_DAY, id=-1904194728,
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1,
> mvcc=false]
> [2020-05-12T18:19:54,949][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=PLM_ITEM, id=-1283854143, dataRegionName=default,
> mode=REPLICATED, atomicity=ATOMIC, backups=2147483647, mvcc=false]
> [2020-05-12T18:22:53,662][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CO, id=64322847, dataRegionName=default,
> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
> [2020-05-12T18:22:54,876][INFO ][main][GridCacheProcessor] Started cache
> in recovery mode [name=CO_CUST, id=1684722246, dataRegionName=default,
> mode=PARTITIONED, atomicity=ATOMIC, backups=1, mvcc=false]
> [2020-05-12T18:22:54,892][INFO ][main][GridCacheDatabaseSharedManager]
> Binary recovery performed in 4970233 ms.
>
> Among them, binary recovery took 4970 seconds.
>
> Our question is:
>
> 1.Why is the start time so long?
>
> 2.Is the current state of ignite, with the growth of single node data
> volume, the restart time will be longer and longer?
>
> 3.Do have any suggestions for optimizing the restart time?
>
>

Re: Binary recovery for a very long time

Posted by 38797715 <38...@qq.com>.

Among them:
CO_CO_NEW: ~ 48 minutes(partitioned,backup=1,33M)

Ignite sys cache: ~ 27 minutes

PLM_ITEM:~3 minutes(repicated,1.9K)


在 2020/5/12 下午9:08, 38797715 写道:
>
> Hi community,
>
> We have 5 servers, 16 cores, 256g memory, and 200g off-heap memory.
> We have 7 tables to test, and the data volume is 
> respectively:31.8M,495.2M,552.3M,33M,873.3K,28M,1.9K(replicated),others 
> are partitioned(backup = 1)
>
> VM args:-server -Xms20g -Xmx20g -XX:+AlwaysPreTouch -XX:+UseG1GC 
> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC -XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 
> -XX:GCLogFileSize=100M -Xloggc:/data/gc/logs/gclog.txt 
> -Djava.net.preferIPv4Stack=true -XX:MaxDirectMemorySize=256M 
> -XX:+PrintAdaptiveSizePolicy
>
> Today, one of the servers was restarted(kill and then start ignite.sh) 
> for some reason, but the node took 1.5 hours to start, which was much 
> longer than expected.
>
> After analyzing the log, the following information is found:
>
> [2020-05-12T17:00:05,138][INFO][main][GridCacheDatabaseSharedManager] 
> Found last checkpoint marker 
> [cpId=7a0564f2-43e5-400b-9439-746fc68a6ccb, pos=FileWALPointer 
> [idx=10511, fileOff=51348888, len=61193]]
> [2020-05-12T17:00:05,151][INFO][main][GridCacheDatabaseSharedManager] 
> Binary memory state restored at node startup 
> [restoredPtr=FileWALPointer [idx=10511, fileOff=51410110, len=0]]
> [2020-05-12T17:00:05,152][INFO][main][FileWriteAheadLogManager] 
> Resuming logging to WAL segment 
> [file=/appdata/ignite/db/wal/24/0000000000000001.wal, offset=51410110, 
> ver=2]
> [2020-05-12T17:00:06,448][INFO][main][PageMemoryImpl] Started page 
> memory [memoryAllocated=200.0GiB, pages=50821088, tableSize=3.9GiB, 
> checkpointBuffer=2.0GiB]
> [2020-05-12T17:02:08,528][INFO][main][GridCacheProcessor] Started 
> cache in recovery mode [name=CO_CO_NEW, id=-189779360, 
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, 
> mvcc=false]
> [2020-05-12T17:50:44,341][INFO][main][GridCacheProcessor] Started 
> cache in recovery mode [name=CO_CO_LINE, id=-1588248812, 
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, 
> mvcc=false]
> [2020-05-12T17:50:44,366][INFO][main][GridCacheProcessor] Started 
> cache in recovery mode [name=ignite-sys-cache, id=-2100569601, 
> dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, 
> backups=2147483647, mvcc=false]
> [2020-05-12T18:17:57,071][INFO][main][GridCacheProcessor] Started 
> cache in recovery mode [name=CO_CO_LINE_NEW, id=1742991829, 
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, 
> mvcc=false]
> [2020-05-12T18:19:54,910][INFO][main][GridCacheProcessor] Started 
> cache in recovery mode [name=PI_COM_DAY, id=-1904194728, 
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, 
> mvcc=false]
> [2020-05-12T18:19:54,949][INFO][main][GridCacheProcessor] Started 
> cache in recovery mode [name=PLM_ITEM, id=-1283854143, 
> dataRegionName=default, mode=REPLICATED, atomicity=ATOMIC, 
> backups=2147483647, mvcc=false]
> [2020-05-12T18:22:53,662][INFO][main][GridCacheProcessor] Started 
> cache in recovery mode [name=CO_CO, id=64322847, 
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, 
> mvcc=false]
> [2020-05-12T18:22:54,876][INFO][main][GridCacheProcessor] Started 
> cache in recovery mode [name=CO_CUST, id=1684722246, 
> dataRegionName=default, mode=PARTITIONED, atomicity=ATOMIC, backups=1, 
> mvcc=false]
> [2020-05-12T18:22:54,892][INFO][main][GridCacheDatabaseSharedManager] 
> Binary recovery performed in 4970233ms.
>
> Among them, binary recovery took 4970 seconds.
>
> Our question is:
>
> 1.Why is the start time so long?
>
> 2.Is the current state of ignite, with the growth of single node data 
> volume, the restart time will be longer and longer?
>
> 3.Do have any suggestions for optimizing the restart time?
>