You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2023/02/28 12:47:00 UTC
[jira] [Resolved] (HDDS-4511) HDDS-4511: Avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower.
[ https://issues.apache.org/jira/browse/HDDS-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Doroszlai resolved HDDS-4511.
------------------------------------
Fix Version/s: 1.1.0
Resolution: Fixed
> HDDS-4511: Avoiding StaleNodeHandler to take effect in TestDeleteWithSlowFollower.
> ----------------------------------------------------------------------------------
>
> Key: HDDS-4511
> URL: https://issues.apache.org/jira/browse/HDDS-4511
> Project: Apache Ozone
> Issue Type: Improvement
> Components: SCM
> Affects Versions: 1.1.0
> Reporter: Glen Geng
> Assignee: Glen Geng
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0
>
>
> This improvement is inspired by the fixing of TestDeleteWithSlowFollower in the broken HDDS-2823.
>
> In the test case TestDeleteWithSlowFollower, there is following trace appearing in the log
> {code:java}
> 2020-11-24 19:32:13,551 [EventQueue-StaleNodeForStaleNodeHandler] INFO node.StaleNodeHandler (StaleNodeHandler.java:onMessage(58)) - Datanode 132e6d1b-e472-449e-929e-5f42b87114c6{ip: 10.73.23.64, host: 10.73.23.64, networkLocation: /default-rack, certSerialId: null} moved to stale state. Finalizing its pipelines [PipelineID=6f0e173c-b5e2-4dc6-99e1-854aafdc8295, PipelineID=c78bc2fb-dca1-4e09-ba71-dd824e2d4e73]2020-11-24 19:32:13,552 [EventQueue-StaleNodeForStaleNodeHandler] INFO pipeline.SCMPipelineManager (PipelineManagerV2Impl.java:closePipeline(389)) - Pipeline Pipeline[ Id: 6f0e173c-b5e2-4dc6-99e1-854aafdc8295, Nodes: 132e6d1b-e472-449e-929e-5f42b87114c6{ip: 10.73.23.64, host: 10.73.23.64, networkLocation: /default-rack, certSerialId: null}46a77559-9d5c-4a1d-bad7-e7eb7b9c32da{ip: 10.73.23.64, host: 10.73.23.64, networkLocation: /default-rack, certSerialId: null}524fea63-ad85-4a3a-bcfb-ac40dfe3d5e7{ip: 10.73.23.64, host: 10.73.23.64, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN, leaderId:46a77559-9d5c-4a1d-bad7-e7eb7b9c32da, CreationTimestamp2020-11-24T11:30:23.805Z] moved to CLOSED state
> {code}
>
> But by design of this case, the stale node handler should not take effect.
> {code:java}
> // Make the stale, dead and server failure timeout higher so that a dead
> // node is not detecte at SCM as well as the pipeline close action
> // never gets initiated early at Datanode in the test.{code}
>
> This test case relies on ReplicationManager to close the OPEN container in SCM, so that SCM won't hold the delete blocks command.
> It can send out the close container command either because it is an OPEN container but under replicate or it is an OPEN container but it has CLOSED replica.
> Since the default interval of RM is 5m, the test case actually relies the "it is an OPEN container but under replicate" to avoid trigger the stale node handler..
>
> But the command disappears, since ReplicationManager#isContainerUnderReplicated does not consider OPEN container, it only take care of CLOSED and QUASI_CLOSED container.
>
> After talked with [~Sammi], by design, it just needs to explicitly avoid replicating container in DELETING or DELETED state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org