You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2023/01/04 16:18:00 UTC

[jira] [Resolved] (HDDS-7608) Ensure queued commands with old SCM term are not processed

     [ https://issues.apache.org/jira/browse/HDDS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Doroszlai resolved HDDS-7608.
------------------------------------
    Resolution: Implemented

Resolving, since all sub-tasks are done.  Please feel free to reopen and add new sub-task if necessary.

> Ensure queued commands with old SCM term are not processed
> ----------------------------------------------------------
>
>                 Key: HDDS-7608
>                 URL: https://issues.apache.org/jira/browse/HDDS-7608
>             Project: Apache Ozone
>          Issue Type: Improvement
>            Reporter: Stephen O'Donnell
>            Assignee: Attila Doroszlai
>            Priority: Major
>
> With SCM HA, every command sent to a datanode includes the SCM "term". If a new SCM leader is elected due to a failover or restart, the term increases.
> In general, any commands queued on a datanode from an old term should not be processed by the datanode once it notices the term has change, most importantly commands like DeleteContainer, as the new leader may schedule a delete of a different replica and then both deletes complete.
> The DN receives a new term by inspecting the term in each command. If it dequeues a command to process it and finds it has a greater term, it updates the term to the new value. Then any subsequent commands will be dropped if they have the old term.
> There are a few problems here:
> 1) If the DN does not receive any more commands for some reason (unlikely perhaps), then it will not receive the new term and drop any queued commands. Perhaps the term should be included in all heartbeat responses rather than depending on the one in the commands?
> 2) The term is only updated when the first command with the new term reaches the head of the queue. This means all commands before it will still get processed as normal. Perhaps we should update the term when the command is added to the queue, or update based on a field in the heartbeat.
> 3) Replicate and delete replica commands (and perhaps others) are taken from the main queue and added to sub-queues where they may stay for some time. If they are in a sub-queue, the term is never checked again, and it should be.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org