You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ivan Bessonov (Jira)" <ji...@apache.org> on 2020/02/07 14:10:00 UTC
[jira] [Updated] (IGNITE-12499) Node took a long time to start after kill

     [ https://issues.apache.org/jira/browse/IGNITE-12499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Bessonov updated IGNITE-12499:
-----------------------------------
    Description: 
Test scenario:
 1) Start 4 node cluster
 2) Activate
 3) Load 1k rows to each cache
 4) Stop node
 5) Return it back without index.bin files
 6) Wait until start

Somehow the first node takes Waiting for topology snapshot: server(s) 4/4, client(s) 0/*, timeout 1166/1800 sec to start.

[10:47:21,360][INFO][main][G] Node started : [stage="Configure system pool" (129 ms),stage="Start managers" (440 ms),stage="Configure binary metadata" (86 ms),stage="Start processors" (39341 ms),stage="Start 'GridGain' plugin" (16 ms),s
 tage="Init and start regions" (210 ms),stage="Restore binary memory" (228224 ms),stage="Restore logical state" (859694 ms),stage="Finish recovery" (8938 ms),stage="Join topology" (6024 ms),stage="Await transition" (16 ms),stage="Await e
 xchange" (14855 ms),stage="Total time" (1157973 ms)]
h3. Clarification:

"Restore logical state" stage is the longest one and it uses one thread, so CPU/IO utilization is very low. Execution of the same operation in some ExecutorService would drastically speed up the whole node startup process.

  was:
Test scenario:
1) Start 4 node cluster
2) Activate
3) Load 1k rows to each cache
4) Stop node
5) Return it back without index.bin files
6) Wait until start

Somehow the first node takes Waiting for topology snapshot: server(s) 4/4, client(s) 0/*, timeout 1166/1800 sec to start.

[10:47:21,360][INFO][main][G] Node started : [stage="Configure system pool" (129 ms),stage="Start managers" (440 ms),stage="Configure binary metadata" (86 ms),stage="Start processors" (39341 ms),stage="Start 'GridGain' plugin" (16 ms),s
tage="Init and start regions" (210 ms),stage="Restore binary memory" (228224 ms),stage="Restore logical state" (859694 ms),stage="Finish recovery" (8938 ms),stage="Join topology" (6024 ms),stage="Await transition" (16 ms),stage="Await e
xchange" (14855 ms),stage="Total time" (1157973 ms)]


> Node took a long time to start after kill
> -----------------------------------------
>
>                 Key: IGNITE-12499
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12499
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Ivan Bessonov
>            Assignee: Ivan Bessonov
>            Priority: Major
>             Fix For: 2.9
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Test scenario:
>  1) Start 4 node cluster
>  2) Activate
>  3) Load 1k rows to each cache
>  4) Stop node
>  5) Return it back without index.bin files
>  6) Wait until start
> Somehow the first node takes Waiting for topology snapshot: server(s) 4/4, client(s) 0/*, timeout 1166/1800 sec to start.
> [10:47:21,360][INFO][main][G] Node started : [stage="Configure system pool" (129 ms),stage="Start managers" (440 ms),stage="Configure binary metadata" (86 ms),stage="Start processors" (39341 ms),stage="Start 'GridGain' plugin" (16 ms),s
>  tage="Init and start regions" (210 ms),stage="Restore binary memory" (228224 ms),stage="Restore logical state" (859694 ms),stage="Finish recovery" (8938 ms),stage="Join topology" (6024 ms),stage="Await transition" (16 ms),stage="Await e
>  xchange" (14855 ms),stage="Total time" (1157973 ms)]
> h3. Clarification:
> "Restore logical state" stage is the longest one and it uses one thread, so CPU/IO utilization is very low. Execution of the same operation in some ExecutorService would drastically speed up the whole node startup process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)