You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Aravindan Vijayan (Jira)" <ji...@apache.org> on 2020/09/09 22:23:00 UTC

[jira] [Updated] (HDDS-4227) Implement a "prepareForUpgrade" step that applies all committed transactions onto the OM state machine.

     [ https://issues.apache.org/jira/browse/HDDS-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aravindan Vijayan updated HDDS-4227:
------------------------------------
    Description: 
*Why is this needed?*
Through HDDS-4143, we have a generic factory to handle multiple versions of apply transaction implementations based on layout version. Hence, this factory can be used to handle versioned requests across layout versions, whenever both the versions need to exist in the code (Let's say for HDDS-2939). 

However, it has been noticed that the OM ratis requests are still undergoing lot of minor changes (HDDS-4007, HDDS-4007, HDDS-3903), and in these cases it will become hard to maintain 2 versions of the code just to support clean upgrades. 

Hence, the plan is to build a pre-upgrade utility (client API) that makes sure that an OM instance has no "un-applied" transactions in this Raft log. Invoking this client API makes sure that the upgrade starts with a clean state. Of course, this would be needed only in a HA setup. In a non HA setup, this can either be skipped, or when invoked will be a No-Op (Non Ratis) or cause no harm (Single node Ratis).

*How does it work?*
Before updating the software bits, our goal is to get OMs to get to the  latest state with respect to apply transaction. The reason we want this is to make sure that the same version of the code executes the AT step in all the 3 OMs. In a high level, the flow will be as follows.

* Before upgrade, *stop* the OMs.
* Start OMs with a special flag --prepareUpgrade (This is something like --init,  which is a special state which stops the ephemeral OM instance after doing some work)
* When OM is started with the --prepareUpgrade flag, it does not start the RPC server, so no new requests can get in.
* In this state, we give every OM time to apply txn until the last txn.
* We know that at least 2 OMs would have gotten the last client request transaction committed into their log. Hence, those 2 OMs are expected to apply transaction to that index faster.
* At every OM, the Raft log will be purged after this wait period (so that the replay does not happen), and a Ratis snapshot taken at last txn.
* Even if there is a lagger OM which is unable to get to last applied txn index, its logs will be purged after the wait time expires.
* Now when OMs are started with newer version, all the OMs will start using the new code.
* The lagger OM will get the new Ratis snapshot since there are no logs to replay from.

> Implement a "prepareForUpgrade" step that applies all committed transactions onto the OM state machine.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-4227
>                 URL: https://issues.apache.org/jira/browse/HDDS-4227
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Manager
>            Reporter: Aravindan Vijayan
>            Assignee: Aravindan Vijayan
>            Priority: Major
>             Fix For: 1.1.0
>
>
> *Why is this needed?*
> Through HDDS-4143, we have a generic factory to handle multiple versions of apply transaction implementations based on layout version. Hence, this factory can be used to handle versioned requests across layout versions, whenever both the versions need to exist in the code (Let's say for HDDS-2939). 
> However, it has been noticed that the OM ratis requests are still undergoing lot of minor changes (HDDS-4007, HDDS-4007, HDDS-3903), and in these cases it will become hard to maintain 2 versions of the code just to support clean upgrades. 
> Hence, the plan is to build a pre-upgrade utility (client API) that makes sure that an OM instance has no "un-applied" transactions in this Raft log. Invoking this client API makes sure that the upgrade starts with a clean state. Of course, this would be needed only in a HA setup. In a non HA setup, this can either be skipped, or when invoked will be a No-Op (Non Ratis) or cause no harm (Single node Ratis).
> *How does it work?*
> Before updating the software bits, our goal is to get OMs to get to the  latest state with respect to apply transaction. The reason we want this is to make sure that the same version of the code executes the AT step in all the 3 OMs. In a high level, the flow will be as follows.
> * Before upgrade, *stop* the OMs.
> * Start OMs with a special flag --prepareUpgrade (This is something like --init,  which is a special state which stops the ephemeral OM instance after doing some work)
> * When OM is started with the --prepareUpgrade flag, it does not start the RPC server, so no new requests can get in.
> * In this state, we give every OM time to apply txn until the last txn.
> * We know that at least 2 OMs would have gotten the last client request transaction committed into their log. Hence, those 2 OMs are expected to apply transaction to that index faster.
> * At every OM, the Raft log will be purged after this wait period (so that the replay does not happen), and a Ratis snapshot taken at last txn.
> * Even if there is a lagger OM which is unable to get to last applied txn index, its logs will be purged after the wait time expires.
> * Now when OMs are started with newer version, all the OMs will start using the new code.
> * The lagger OM will get the new Ratis snapshot since there are no logs to replay from.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org