You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Shashikant Banerjee (Jira)" <ji...@apache.org> on 2021/06/09 07:58:00 UTC

[jira] [Resolved] (HDDS-5284) [SCM-HA] SCM start failed with PipelineNotFoundException

     [ https://issues.apache.org/jira/browse/HDDS-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shashikant Banerjee resolved HDDS-5284.
---------------------------------------
    Resolution: Fixed

> [SCM-HA] SCM start failed with PipelineNotFoundException
> --------------------------------------------------------
>
>                 Key: HDDS-5284
>                 URL: https://issues.apache.org/jira/browse/HDDS-5284
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM HA
>            Reporter: Nilotpal Nandi
>            Assignee: Shashikant Banerjee
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.2.0
>
>
> {code:java}
> scm.log 
> 2021-05-27 09:55:42,189 INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=875b2073-4034-4374-bba6-39011294a280 to datanode:028fed4a-0087-4b70-b6e3-11f18d739094
> 2021-05-27 09:55:42,189 INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=875b2073-4034-4374-bba6-39011294a280 to datanode:a4b76016-dc24-47f2-a3ff-03c309fdcf9b
> 2021-05-27 09:55:42,189 INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=875b2073-4034-4374-bba6-39011294a280 to datanode:ed9d4872-166d-41c6-96ab-437a44e4168b
> 2021-05-27 09:55:42,199 INFO org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 875b2073-4034-4374-bba6-39011294a280, Nodes: 028fed4a-0087-4b70-b6e3-11f18d739094{ip: 172.27.167.6, host: quasar-wudsvy-6.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}a4b76016-dc24-47f2-a3ff-03c309fdcf9b{ip: 172.27.12.201, host: quasar-wudsvy-4.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}ed9d4872-166d-41c6-96ab-437a44e4168b{ip: 172.27.74.4, host: quasar-wudsvy-1.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/THREE, State:ALLOCATED, leaderId:, CreationTimestamp2021-05-27T09:55:42.189Z].
> 2021-05-27 09:55:54,426 INFO org.apache.hadoop.hdds.scm.pipeline.PipelineManagerV2Impl: Pipeline Pipeline[ Id: 875b2073-4034-4374-bba6-39011294a280, Nodes: 028fed4a-0087-4b70-b6e3-11f18d739094{ip: 172.27.167.6, host: quasar-wudsvy-6.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}a4b76016-dc24-47f2-a3ff-03c309fdcf9b{ip: 172.27.12.201, host: quasar-wudsvy-4.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}ed9d4872-166d-41c6-96ab-437a44e4168b{ip: 172.27.74.4, host: quasar-wudsvy-1.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/THREE, State:ALLOCATED, leaderId:028fed4a-0087-4b70-b6e3-11f18d739094, CreationTimestamp2021-05-27T09:55:42.189Z] moved to OPEN state
> 2021-05-27 10:06:45,920 INFO org.apache.hadoop.hdds.scm.node.StaleNodeHandler: Datanode 028fed4a-0087-4b70-b6e3-11f18d739094{ip: 172.27.167.6, host: quasar-wudsvy-6.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0} moved to stale state. Finalizing its pipelines [PipelineID=cd4a2a77-9715-4437-8d1d-3618a2c93103, PipelineID=ca6100b9-b42c-4b77-bef5-35a9b1e725f2, PipelineID=875b2073-4034-4374-bba6-39011294a280]
> 2021-05-27 10:06:45,932 INFO org.apache.hadoop.hdds.scm.pipeline.PipelineManagerV2Impl: Pipeline Pipeline[ Id: 875b2073-4034-4374-bba6-39011294a280, Nodes: 028fed4a-0087-4b70-b6e3-11f18d739094{ip: 172.27.167.6, host: quasar-wudsvy-6.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}a4b76016-dc24-47f2-a3ff-03c309fdcf9b{ip: 172.27.12.201, host: quasar-wudsvy-4.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}ed9d4872-166d-41c6-96ab-437a44e4168b{ip: 172.27.74.4, host: quasar-wudsvy-1.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/THREE, State:DORMANT, leaderId:a4b76016-dc24-47f2-a3ff-03c309fdcf9b, CreationTimestamp2021-05-27T09:55:42.189Z] moved to CLOSED state
> 2021-05-27 10:06:57,921 INFO org.apache.hadoop.hdds.scm.node.StaleNodeHandler: Datanode a4b76016-dc24-47f2-a3ff-03c309fdcf9b{ip: 172.27.12.201, host: quasar-wudsvy-4.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0} moved to stale state. Finalizing its pipelines [PipelineID=cd4a2a77-9715-4437-8d1d-3618a2c93103, PipelineID875b2073-4034-4374-bba6-39011294a280, PipelineID=2878c722-84dc-40f9-b1c1-46ed0f8bcdd7]
> 2021-05-27 10:07:41,073 INFO org.apache.hadoop.hdds.scm.pipeline.PipelineManagerV2Impl: Scrubbing pipeline: id: PipelineID=875b2073-4034-4374-bba6-39011294a280 since it stays at CLOSED stage.
> 2021-05-27 10:07:41,073 INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send pipeline:PipelineID=875b2073-4034-4374-bba6-39011294a280 close command to datanode 028fed4a-0087-4b70-b6e3-11f18d739094
> 2021-05-27 10:07:41,073 INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send pipeline:PipelineID=875b2073-4034-4374-bba6-39011294a280 close command to datanode a4b76016-dc24-47f2-a3ff-03c309fdcf9b
> 2021-05-27 10:07:41,073 INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send pipeline:PipelineID=875b2073-4034-4374-bba6-39011294a280 close command to datanode ed9d4872-166d-41c6-96ab-437a44e4168b
> 2021-05-27 10:07:41,075 INFO org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager: Pipeline Pipeline[ Id: 875b2073-4034-4374-bba6-39011294a280, Nodes: 028fed4a-0087-4b70-b6e3-11f18d739094{ip: 172.27.167.6, host: quasar-wudsvy-6.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}a4b76016-dc24-47f2-a3ff-03c309fdcf9b{ip: 172.27.12.201, host: quasar-wudsvy-4.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}ed9d4872-166d-41c6-96ab-437a44e4168b{ip: 172.27.74.4, host: quasar-wudsvy-1.quasar-wudsvy.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/THREE, State:CLOSED, leaderId:a4b76016-dc24-47f2-a3ff-03c309fdcf9b, CreationTimestamp2021-05-27T09:55:42.189Z] removed.
> {code}
> The logs indicate that, a pipeline got created, moved to open state, and then one of the datanodes went stale, thereby the pipeline moved to closed state. The pipeline got scrubbed by the pipeline scrubber and got deleted. 
> {code:java}
> 2021-05-27 10:07:41,073 INFO org.apache.hadoop.hdds.scm.pipeline.PipelineManagerV2Impl: Scrubbing pipeline: id: PipelineID=875b2073-4034-4374-bba6-39011294a280 since it stays at CLOSED stage.{code}
> Next update for the pipeline to be moved to close state as a part of  report from other datanodes in the pipeline will fail as the pipeline is removed from scm memory/db and hence scm terminates.
> The solution would be to ignore PipelineNotFoundException in PipelineStateManagerV2Impl#updatePipelineState.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org