You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Arpit Agarwal (Jira)" <ji...@apache.org> on 2020/06/02 21:45:00 UTC
[jira] [Updated] (HDDS-3586) OM HA can be started with 3 isolated
LEADER instead of one OM ring
[ https://issues.apache.org/jira/browse/HDDS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arpit Agarwal updated HDDS-3586:
--------------------------------
Labels: Triaged pull-request-available (was: pull-request-available)
> OM HA can be started with 3 isolated LEADER instead of one OM ring
> ------------------------------------------------------------------
>
> Key: HDDS-3586
> URL: https://issues.apache.org/jira/browse/HDDS-3586
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: Marton Elek
> Assignee: Hanisha Koneru
> Priority: Critical
> Labels: Triaged, pull-request-available
>
> Steps to reproduce:
> Imagine that I have 3 different om with the following DNS names:
> {code}
> ozone-om-0.ozone-om
> ozone-om-1.ozone-om
> ozone-om-2.ozone-om
> {code}
> I configured the three hosts as the following:
> {code}
> OZONE-SITE.XML_ozone.om.nodes.omservice: om1,om2,om3
> OZONE-SITE.XML_ozone.om.address.omservice.om1: ozone-om-0
> OZONE-SITE.XML_ozone.om.address.omservice.om2: ozone-om-1
> OZONE-SITE.XML_ozone.om.address.omservice.om3: ozone-om-2
> OZONE-SITE.XML_ozone.om.ratis.enable: "true"
> {code}
> But unfortunately the DNS is not reliable. All the hosts can resolve only the LOCAL hostname.
> OMHANodeDetails.java ignores ALL the configuration which are not resolvable:
> {code}
> if (!addr.isUnresolved()) {
> if (!isPeer && OmUtils.isAddressLocal(addr)) {
> localRpcAddress = addr;
> localOMServiceId = serviceId;
> localOMNodeId = nodeId;
> localRatisPort = ratisPort;
> found++;
> } else {
> // This OMNode belongs to same OM service as the current OMNode.
> // Add it to peerNodes list.
> // This OMNode belongs to same OM service as the current OMNode.
> // Add it to peerNodes list.
> peerNodesList.add(getHAOMNodeDetails(conf, serviceId,
> nodeId, addr, ratisPort));
> }
> }
> {code}
> As a result I will have 3 running server but each has 1 one-node Ratis ring (peerNodesList is empty as only the local hostname can be resolved).
> Group ID is the same for all. But they have separated database and they work as separated OM which is VERY dangerous.
> 1. Option one: we can accept any unresolved address and retry with connection create if it couldn't be connected
> 2. Option two: at least the error handling should be fixed. When I configured 3 om, there supposed to be 3 om.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org