You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexander Lapin (Jira)" <ji...@apache.org> on 2021/07/22 09:07:00 UTC

[jira] [Updated] (IGNITE-15148) Implement top level node stop logic

     [ https://issues.apache.org/jira/browse/IGNITE-15148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexander Lapin updated IGNITE-15148:
-------------------------------------
    Description: 
It's possible to stop node both when node was already started or during the startup process, in that case
node stop will prevent any new components startup and stop already started ones.

Following method was added to Ignition interface:
{color:#808080}
{color}
{code:java}
/**
 * Stops node with given node. It's possible to stop both already started node or node that is currently starting.
 *
 * @param name Node name to stop.
 */
 public void stop(@NotNull String name{code}
It's also possible to stop a node by calling close() on an already started Ignite instance.
 * As a starting point stop process checks node status:If it's STOPPING - nothing happens, cause previous intention to stop node was already detected.
 * If it's STARTING (means that node is somewhere in the middle of the startup process) - node status will be updated to
STOPPING and later on startup process will detect status change on attempt of starting next component, prevent further
startup process and stop already started managers and corresponding inner components.
 * if it's STARTED - explicit stop will take an action. All components will be stopped.

In all cases the node stop process consists of two phases:
 * At phase one ``onNodeStop()`` will be called on all started components in reverse order, meaning that the last
started component will run ``onNodeStop()`` first. For most components ``onNodeStop()`` will be No-op. Core idea here
is to stop network communication on ``onNodeStop()`` in order to terminate distributed operations gracefully:
no network communication is allowed but node local logic still remains consistent.
 * At phase two within a write busy lock, ``stop()`` will be called on all started components also in reverse order,
here thread stopping logic, inner structures cleanup and other related logic takes it time. Please pay attention that
at this point network communication isn't possible.

Besides local node stopping logic two more actions took place on a cluster as a result of node left event:
 * Both range and watch cursors will be removed on server side. Given process is linearized with meta storage
operations by using a meta storage raft.
 * Baseline update and corresponding baseline recalculation with ongoing partition raft groups redeployment.

 

 

> Implement top level node stop logic
> -----------------------------------
>
>                 Key: IGNITE-15148
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15148
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>   Original Estimate: 72h
>          Time Spent: 10m
>  Remaining Estimate: 71h 50m
>
> It's possible to stop node both when node was already started or during the startup process, in that case
> node stop will prevent any new components startup and stop already started ones.
> Following method was added to Ignition interface:
> {color:#808080}
> {color}
> {code:java}
> /**
>  * Stops node with given node. It's possible to stop both already started node or node that is currently starting.
>  *
>  * @param name Node name to stop.
>  */
>  public void stop(@NotNull String name{code}
> It's also possible to stop a node by calling close() on an already started Ignite instance.
>  * As a starting point stop process checks node status:If it's STOPPING - nothing happens, cause previous intention to stop node was already detected.
>  * If it's STARTING (means that node is somewhere in the middle of the startup process) - node status will be updated to
> STOPPING and later on startup process will detect status change on attempt of starting next component, prevent further
> startup process and stop already started managers and corresponding inner components.
>  * if it's STARTED - explicit stop will take an action. All components will be stopped.
> In all cases the node stop process consists of two phases:
>  * At phase one ``onNodeStop()`` will be called on all started components in reverse order, meaning that the last
> started component will run ``onNodeStop()`` first. For most components ``onNodeStop()`` will be No-op. Core idea here
> is to stop network communication on ``onNodeStop()`` in order to terminate distributed operations gracefully:
> no network communication is allowed but node local logic still remains consistent.
>  * At phase two within a write busy lock, ``stop()`` will be called on all started components also in reverse order,
> here thread stopping logic, inner structures cleanup and other related logic takes it time. Please pay attention that
> at this point network communication isn't possible.
> Besides local node stopping logic two more actions took place on a cluster as a result of node left event:
>  * Both range and watch cursors will be removed on server side. Given process is linearized with meta storage
> operations by using a meta storage raft.
>  * Baseline update and corresponding baseline recalculation with ongoing partition raft groups redeployment.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)