You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2020/02/28 12:56:00 UTC

[jira] [Assigned] (HDDS-3107) Pipelines may not be rack aware on cluster startup

     [ https://issues.apache.org/jira/browse/HDDS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen O'Donnell reassigned HDDS-3107:
---------------------------------------

    Assignee:     (was: Stephen O'Donnell)

> Pipelines may not be rack aware on cluster startup
> --------------------------------------------------
>
>                 Key: HDDS-3107
>                 URL: https://issues.apache.org/jira/browse/HDDS-3107
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM
>    Affects Versions: 0.6.0
>            Reporter: Stephen O'Donnell
>            Priority: Major
>         Attachments: docker-ozone-topology-ozone-topology-readdata-scm.log
>
>
> Given a 6 node cluster with 2 racks so there are 3 nodes per rack, it is possible for the pipeline to be created in a non-rack-aware way on startup.
> Using a robot test, like the one in HDDS-3084 intermittently I can see that if all nodes from one rack get registered first, a pipeline creation is triggered on them resulting in a pipeline which is all on one rack. Then the next 3 nodes register and as there are no nodes available on the other rack, they too join a "one rack" pipeline.
> This log snippet shows this happening. I will attach the full docker-compose log:
> {code}
> egrep "Sending CreatePipelineCommand|Registered Data node|Created pipe" docker-ozone-topology-ozone-topology-readdata-scm.log
> scm_1         | 2020-02-28 12:27:57,826 [IPC Server handler 6 on 9861] INFO node.SCMNodeManager: Registered Data node : 74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host: ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}
> scm_1         | 2020-02-28 12:27:57,840 [IPC Server handler 9 on 9861] INFO node.SCMNodeManager: Registered Data node : 32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host: ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}
> scm_1         | 2020-02-28 12:27:57,903 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=16806a56-8e35-46b2-aefd-cb5232d6f5f7 to datanode:32be7fa9-1ff6-4bb3-8bed-8648d276ae07
> scm_1         | 2020-02-28 12:27:57,924 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 16806a56-8e35-46b2-aefd-cb5232d6f5f7, Nodes: 32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host: ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:57.891553Z]
> scm_1         | 2020-02-28 12:27:57,932 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=5a3edf1e-84f6-48ef-a333-6f3e924898a6 to datanode:74084fe6-60a9-45d6-b02c-a9fa7ed24e3a
> scm_1         | 2020-02-28 12:27:57,933 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 5a3edf1e-84f6-48ef-a333-6f3e924898a6, Nodes: 74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host: ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:57.932422Z]
> scm_1         | 2020-02-28 12:27:58,213 [IPC Server handler 8 on 9861] INFO node.SCMNodeManager: Registered Data node : 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host: ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}
> scm_1         | 2020-02-28 12:27:58,216 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=ba2034fc-cb11-482a-9843-435294862240 to datanode:4ce489a3-e3da-4f2a-9ddc-b01b634a68b6
> scm_1         | 2020-02-28 12:27:58,216 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: ba2034fc-cb11-482a-9843-435294862240, Nodes: 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host: ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:58.216275Z]
> scm_1         | 2020-02-28 12:27:58,218 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to datanode:4ce489a3-e3da-4f2a-9ddc-b01b634a68b6
> scm_1         | 2020-02-28 12:27:58,219 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to datanode:74084fe6-60a9-45d6-b02c-a9fa7ed24e3a
> scm_1         | 2020-02-28 12:27:58,220 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=4f16913d-ec06-44b4-a577-6664a517e401 to datanode:32be7fa9-1ff6-4bb3-8bed-8648d276ae07
> scm_1         | 2020-02-28 12:27:58,221 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 4f16913d-ec06-44b4-a577-6664a517e401, Nodes: 4ce489a3-e3da-4f2a-9ddc-b01b634a68b6{ip: 10.5.0.4, host: ozone-topology_datanode_1_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}74084fe6-60a9-45d6-b02c-a9fa7ed24e3a{ip: 10.5.0.6, host: ozone-topology_datanode_3_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}32be7fa9-1ff6-4bb3-8bed-8648d276ae07{ip: 10.5.0.5, host: ozone-topology_datanode_2_1.ozone-topology_net, networkLocation: /rack1, certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:58.218896Z]
> scm_1         | 2020-02-28 12:27:58,645 [IPC Server handler 7 on 9861] INFO node.SCMNodeManager: Registered Data node : 66ec72b2-4be5-453f-ac44-cc9857bad5f0{ip: 10.5.0.8, host: ozone-topology_datanode_5_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}
> scm_1         | 2020-02-28 12:27:58,645 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=4739840f-8bb3-4742-ac5e-ac519b51e0fd to datanode:66ec72b2-4be5-453f-ac44-cc9857bad5f0
> scm_1         | 2020-02-28 12:27:58,647 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 4739840f-8bb3-4742-ac5e-ac519b51e0fd, Nodes: 66ec72b2-4be5-453f-ac44-cc9857bad5f0{ip: 10.5.0.8, host: ozone-topology_datanode_5_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:58.645455Z]
> scm_1         | 2020-02-28 12:27:59,339 [IPC Server handler 7 on 9861] INFO node.SCMNodeManager: Registered Data node : 9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host: ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}
> scm_1         | 2020-02-28 12:27:59,340 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=555b9a1d-1c4a-4d9f-b198-492da7005ccd to datanode:9be38eea-bacc-434a-876d-50b105d4daa2
> scm_1         | 2020-02-28 12:27:59,341 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 555b9a1d-1c4a-4d9f-b198-492da7005ccd, Nodes: 9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host: ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:59.340193Z]
> scm_1         | 2020-02-28 12:27:59,672 [IPC Server handler 6 on 9861] INFO node.SCMNodeManager: Registered Data node : cc1827a2-e4d2-47b4-a13a-1d990c6e36e1{ip: 10.5.0.7, host: ozone-topology_datanode_4_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}
> scm_1         | 2020-02-28 12:27:59,673 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=a6d77ef7-52c0-4f6a-8c22-f0b405da08a1 to datanode:cc1827a2-e4d2-47b4-a13a-1d990c6e36e1
> scm_1         | 2020-02-28 12:27:59,674 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: a6d77ef7-52c0-4f6a-8c22-f0b405da08a1, Nodes: cc1827a2-e4d2-47b4-a13a-1d990c6e36e1{ip: 10.5.0.7, host: ozone-topology_datanode_4_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}, Type:RATIS, Factor:ONE, State:ALLOCATED, leaderId:null, CreationTimestamp2020-02-28T12:27:59.673585Z]
> scm_1         | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to datanode:9be38eea-bacc-434a-876d-50b105d4daa2
> scm_1         | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to datanode:66ec72b2-4be5-453f-ac44-cc9857bad5f0
> scm_1         | 2020-02-28 12:27:59,683 [RatisPipelineUtilsThread] INFO pipeline.RatisPipelineProvider: Sending CreatePipelineCommand for pipeline:PipelineID=70cfd35d-b778-42df-bcba-3ba14bd8ead0 to datanode:cc1827a2-e4d2-47b4-a13a-1d990c6e36e1
> scm_1         | 2020-02-28 12:27:59,684 [RatisPipelineUtilsThread] INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 70cfd35d-b778-42df-bcba-3ba14bd8ead0, Nodes: 9be38eea-bacc-434a-876d-50b105d4daa2{ip: 10.5.0.9, host: ozone-topology_datanode_6_1.ozone-topology_net, networkLocation: /rack2, certSerialId: null}66ec72b2-4be5-453f-ac44-cc9857bad
> {code}
> I believe there are a few things to consider here:
> 1) Do we need a better way to see if rack awareness is enabled? Currently we check the network topology for a count of rack nodes, but these are only created as the nodes register. Should we use the cluster map to determine the intended number of racks on the cluster?
> 2) Should we fallback to non-rack-aware so easily? Pipelines are long lived, and if they are created non-rack aware, they will stay that way potential forever. Maybe we need to delay pipeline creation on startup until the node count settles?
> 3) If a pipeline or new container is being placed non-rack aware in a rack aware cluster should we complain loudly in the logs, JMX, in Recon?
> 4) Do we need something to check for non-rack aware pipelines and fix them if it can? Eg if we have 2 racks, and stop 1 rack, then we must create a non-rack-aware pipeline to keep on writing, but when the other rack is restarted, that pipeline should be destroyed and a new rack-aware one created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org