You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ivan Andika (Jira)" <ji...@apache.org> on 2023/12/19 03:40:00 UTC

[jira] [Created] (HDDS-9959) Propagate close pipelines to other datanodes in the pipeline

Ivan Andika created HDDS-9959:
---------------------------------

             Summary: Propagate close pipelines to other datanodes in the pipeline
                 Key: HDDS-9959
                 URL: https://issues.apache.org/jira/browse/HDDS-9959
             Project: Apache Ozone
          Issue Type: Improvement
          Components: DN, Ozone Datanode
            Reporter: Ivan Andika
            Assignee: Ivan Andika


In https://issues.apache.org/jira/browse/RATIS-1947, it was found that there might be cases where Datanodes in the same pipeline are closed hours apart. 
# dn1
2023-11-29 15:22:59,477 [Command processor thread] INFO org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler: Close Pipeline PipelineID=23e46782-6b48-4559-b3ac-0f95993cf0bc command on datanode 1669a7e6-fe3c-4f7e-8fcb-ec5d5027b0eb.

# dn5
2023-11-29 14:07:55,442 [Command processor thread] INFO org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler: Close Pipeline PipelineID=23e46782-6b48-4559-b3ac-0f95993cf0bc command on datanode bd1e72ab-cfd5-4cc1-8fbf-6ec9d9654c98.

# dn8 
2023-11-29 16:57:53,894 [Command processor thread] INFO org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler: Close Pipeline PipelineID=23e46782-6b48-4559-b3ac-0f95993cf0bc command on datanode 4a23d1e8-d526-4a4d-8ed1-13ffbab3a5cc. 
This might happen when there are a lot of commands queues in some of the Datanode's commandQueue, causing some command to be handled earlier than the other.

Furthermore, Ratis group remove operation is only local to the Raft server and not propagated to the other Raft peers in the same group.

Therefore, similar to CreatePipelineCommand, whenever a datanode receives a pipeline close command, it also needs to propagate the group remove command to the other datanodes (Raft peers) in the same pipeline.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org