You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Sumit Agrawal (Jira)" <ji...@apache.org> on 2023/05/18 15:20:00 UTC

[jira] [Commented] (HDDS-3277) Datanodes do not close pipeline when pipeline directory is deleted.

    [ https://issues.apache.org/jira/browse/HDDS-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723937#comment-17723937 ] 

Sumit Agrawal commented on HDDS-3277:
-------------------------------------

As Verified 2 scenario:

Scenario 1: delete pipeline directory from datanode 1 and stop other datanode 2 & 3
 * Its observed that there are multiple times, leader election with failure (this cause pipeline closure event)
 * when snapshot is taken by ratis, and it fails, it will stop pipeline successfully

Scenario 2: delete pipeline directory only from datanode 1
 * There is no action on pipeline and remain idle
 * when any write happens and if it fails, it will trigger closepipeline, and pipeline is closed successfully

So some action is required to know the problem with pipeline and based on that closepipeline action, it will remove.

 

So based on this, its not required to handle anything as delete pipeline is not normal and any failure will trigger close pipeline which will remove the pipeline in certain amount of time.

> Datanodes do not close pipeline when pipeline directory is deleted.
> -------------------------------------------------------------------
>
>                 Key: HDDS-3277
>                 URL: https://issues.apache.org/jira/browse/HDDS-3277
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 1.0.0
>            Reporter: Mukul Kumar Singh
>            Priority: Critical
>              Labels: MiniOzoneChaosCluster
>
> First the pipeline was deleted
> {code}
> 2020-03-25 19:44:22,669 [pool-22-thread-1] INFO  failure.Failures (FailureManager.java:fail(49)) - failing with, DeletePipelineFailure
> 2020-03-25 19:44:22,669 [pool-22-thread-1] INFO  failure.Failures (Failures.java:fail(118)) - deleteing pipeline directory /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
> c5/datanode-0/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
> 2020-03-25 19:44:22,679 [pool-22-thread-1] INFO  failure.Failures (Failures.java:fail(118)) - deleteing pipeline directory /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
> c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
> 2020-03-25 19:44:22,681 [pool-22-thread-1] INFO  failure.Failures (Failures.java:fail(118)) - deleteing pipeline directory /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700
> c5/datanode-5/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9
> {code}
> However no pipeline failure handling was issued to SCM.
> {code}
> 2020-03-25 19:44:24,532 [b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater] ERROR ratis.ContainerStateMachine (ContainerStateMachine.java:takeSnapshot(302)) - group-C95A81785DF9: Failed to write snapshot at:(t:1, i:2037) file /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9/sm/snapshot.1_2037
> 2020-03-25 19:44:24,532 [b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater] ERROR impl.StateMachineUpdater (StateMachineUpdater.java:takeSnapshot(269)) - b5d165bc-d2b3-497c-ae38-10f649674a3f@group-C95A81785DF9-StateMachineUpdater: Failed to take snapshot
> java.io.FileNotFoundException: /tmp/chaos-2020-03-25-19-42-52-IST/MiniOzoneClusterImpl-ef9b224f-a403-4e9b-a27a-ed38f46700c5/datanode-3/data/ratis/c4275846-2a44-4f53-b00d-c95a81785df9/sm/snapshot.1_2037 (No such file or directory)
>         at java.io.FileOutputStream.open0(Native Method)
>         at java.io.FileOutputStream.open(FileOutputStream.java:270)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
>         at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
>         at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:296)
>         at org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:258)
>         at org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:250)
>         at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:169)
>         at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org