You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2021/10/20 20:34:10 UTC

[jira] [Updated] (HDDS-3107) Pipelines may not be rack aware on cluster startup

     [ https://issues.apache.org/jira/browse/HDDS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Rose updated HDDS-3107:
-----------------------------
    Target Version/s: 1.3.0  (was: 1.2.0)

I am managing the 1.2.0 release and we currently have more than 600 issues targeted for 1.2.0. I am moving the target field to 1.3.0.

If you are actively working on this jira and believe this should be targeted for the 1.2.0 release, Please reach out to me via Apache email or Slack.

> Pipelines may not be rack aware on cluster startup
> --------------------------------------------------
>
>                 Key: HDDS-3107
>                 URL: https://issues.apache.org/jira/browse/HDDS-3107
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>    Affects Versions: 1.0.0
>            Reporter: Stephen O'Donnell
>            Priority: Major
>         Attachments: docker-ozone-topology-ozone-topology-readdata-scm.log
>
>
> Given a 6 node cluster with 2 racks so there are 3 nodes per rack, it is possible for the pipeline to be created in a non-rack-aware way on startup.
> Using a robot test, like the one in HDDS-3084 intermittently I can see that if all nodes from one rack get registered first, a pipeline creation is triggered on them resulting in a pipeline which is all on one rack. Then the next 3 nodes register and as there are no nodes available on the other rack, they too join a "one rack" pipeline.
> This log snippet shows this happening. I will attach the full docker-compose log:
> {code}
> egrep "Sending CreatePipelineCommand|Registered Data node|Created pipe" docker-ozone-topology-ozone-topology-readdata-scm.log
> scm_1         | 2020-02-28 12:27:57,826 [IPC Server handler 6 on 9861] INFO node.SCMNodeManager: Registered Data node : 74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host: ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}
> scm_1         | 2020-02-28 12:27:57,840 [IPC Server handler 9 on 9861] INFO node.SCMNodeManager: Registered Data node : 32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host: ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}
> scm_1         | 2020-02-28 12:27:57,903 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=16806a56-8e35-46b2-aefd-cb5232d6f5f7 to datanode:32be7fa9-1ff6-4bb3-8bed-8648d276ae07
> scm_1         | 2020-02-28 12:27:57,924 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 16806a56-8e35-46b2-aefd-cb5232d6f5f7, Nodes: 32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host: ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:57.891553Z]
> scm_1         | 2020-02-28 12:27:57,932 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=5a3edf1e-84f6-48ef-a333-6f3e924898a6 to datanode:74084fe6-60a9-45d6-b02c-a9fa7ed24e3a
> scm_1         | 2020-02-28 12:27:57,933 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 5a3edf1e-84f6-48ef-a333-6f3e924898a6, Nodes: 74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host: ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:57.932422Z]
> scm_1         | 2020-02-28 12:27:58,213 [IPC Server handler 8 on 9861] INFO node.SCMNodeManager: Registered Data node : 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host: ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}
> scm_1         | 2020-02-28 12:27:58,216 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=ba2034fc-cb11-482a-9843-435294862240 to datanode:4ce489a3-e3da-4f2a-9ddc-b01b634a68b6
> scm_1         | 2020-02-28 12:27:58,216 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: ba2034fc-cb11-482a-9843-435294862240, Nodes: 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host: ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:58.216275Z]
> scm_1         | 2020-02-28 12:27:58,218 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to datanode:4ce489a3-e3da-4f2a-9ddc-b01b634a68b6
> scm_1         | 2020-02-28 12:27:58,219 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to datanode:74084fe6-60a9-45d6-b02c-a9fa7ed24e3a
> scm_1         | 2020-02-28 12:27:58,220 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to datanode:32be7fa9-1ff6-4bb3-8bed-8648d276ae07
> scm_1         | 2020-02-28 12:27:58,221 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 4f16913d-ec06-44b4-a577-6664a517e401, Nodes: 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host: ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host: ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host: ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:58.218896Z]
> scm_1         | 2020-02-28 12:27:58,645 [IPC Server handler 7 on 9861] INFO node.SCMNodeManager: Registered Data node : 66ec72b2-4be5-453f-ac44-cc9857bad5f0{ip: 10.5.0.8, host: ozone-topology_datanode_5_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}
> scm_1         | 2020-02-28 12:27:58,645 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=4739840f-8bb3-4742-ac5e-ac519b51e0fd to datanode:66ec72b2-4be5-453f-ac44-cc9857bad5f0
> scm_1         | 2020-02-28 12:27:58,647 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 4739840f-8bb3-4742-ac5e-ac519b51e0fd, Nodes: 66ec72b2-4be5-453f-ac44-cc9857bad5f0{ip: 10.5.0.8, host: ozone-topology_datanode_5_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:58.645455Z]
> scm_1         | 2020-02-28 12:27:59,339 [IPC Server handler 7 on 9861] INFO node.SCMNodeManager: Registered Data node : 9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host: ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}
> scm_1         | 2020-02-28 12:27:59,340 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=555b9a1d-1c4a-4d9f-b198-492da7005ccd to datanode:9be38eea-bacc-434a-876d-50b105d4daa2
> scm_1         | 2020-02-28 12:27:59,341 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 555b9a1d-1c4a-4d9f-b198-492da7005ccd, Nodes: 9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host: ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:59.340193Z]
> scm_1         | 2020-02-28 12:27:59,672 [IPC Server handler 6 on 9861] INFO node.SCMNodeManager: Registered Data node : cc1827a2-e4d2-47b4-a13a-1d990c6e36e1{ip: 10.5.0.7, host: ozone-topology_datanode_4_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}
> scm_1         | 2020-02-28 12:27:59,673 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=a6d77ef7-52c0-4f6a-8c22-f0b405da08a1 to datanode:cc1827a2-e4d2-47b4-a13a-1d990c6e36e1
> scm_1         | 2020-02-28 12:27:59,674 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: a6d77ef7-52c0-4f6a-8c22-f0b405da08a1, Nodes: cc1827a2-e4d2-47b4-a13a-1d990c6e36e1{ip: 10.5.0.7, host: ozone-topology_datanode_4_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:59.673585Z]
> scm_1         | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to datanode:9be38eea-bacc-434a-876d-50b105d4daa2
> scm_1         | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to datanode:66ec72b2-4be5-453f-ac44-cc9857bad5f0
> scm_1         | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to datanode:cc1827a2-e4d2-47b4-a13a-1d990c6e36e1
> scm_1         | 2020-02-28 12:27:59,684 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 70cfd35d-b778-42df-bcba-3ba14bd8ead0, Nodes: 9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host: ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}66ec72b2-4be5-453f-ac44-cc9857bad
> {code}
> I believe there are a few things to consider here:
> 1) Do we need a better way to see if rack awareness is enabled? Currently we check the network topology for a count of rack nodes, but these are only created as the nodes register. Should we use the cluster map to determine the intended number of racks on the cluster?
> 2) Should we fallback to non-rack-aware so easily? Pipelines are long lived, and if they are created non-rack aware, they will stay that way potential forever. Maybe we need to delay pipeline creation on startup until the node count settles?
> 3) If a pipeline or new container is being placed non-rack aware in a rack aware cluster should we complain loudly in the logs, JMX, in Recon?
> 4) Do we need something to check for non-rack aware pipelines and fix them if it can? Eg if we have 2 racks, and stop 1 rack, then we must create a non-rack-aware pipeline to keep on writing, but when the other rack is restarted, that pipeline should be destroyed and a new rack-aware one created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org