You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Arpit Agarwal (Jira)" <ji...@apache.org> on 2020/06/02 21:45:00 UTC

[jira] [Updated] (HDDS-3586) OM HA can be started with 3 isolated LEADER instead of one OM ring

     [ https://issues.apache.org/jira/browse/HDDS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arpit Agarwal updated HDDS-3586:
--------------------------------
    Labels: Triaged pull-request-available  (was: pull-request-available)

> OM HA can be started with 3 isolated LEADER instead of one OM ring
> ------------------------------------------------------------------
>
>                 Key: HDDS-3586
>                 URL: https://issues.apache.org/jira/browse/HDDS-3586
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: Marton Elek
>            Assignee: Hanisha Koneru
>            Priority: Critical
>              Labels: Triaged, pull-request-available
>
> Steps to reproduce:
> Imagine that I have 3 different om with the following DNS names:
> {code}
> ozone-om-0.ozone-om
> ozone-om-1.ozone-om
> ozone-om-2.ozone-om
> {code}
> I configured the three hosts as the following:
> {code}
>   OZONE-SITE.XML_ozone.om.nodes.omservice: om1,om2,om3
>   OZONE-SITE.XML_ozone.om.address.omservice.om1: ozone-om-0
>   OZONE-SITE.XML_ozone.om.address.omservice.om2: ozone-om-1
>   OZONE-SITE.XML_ozone.om.address.omservice.om3: ozone-om-2
>   OZONE-SITE.XML_ozone.om.ratis.enable: "true"
> {code}
> But unfortunately the DNS is not reliable. All the hosts can resolve only the LOCAL hostname.
> OMHANodeDetails.java ignores ALL the configuration which are not resolvable:
> {code}
>  if (!addr.isUnresolved()) {
>           if (!isPeer && OmUtils.isAddressLocal(addr)) {
>             localRpcAddress = addr;
>             localOMServiceId = serviceId;
>             localOMNodeId = nodeId;
>             localRatisPort = ratisPort;
>             found++;
>           } else {
>             // This OMNode belongs to same OM service as the current OMNode.
>             // Add it to peerNodes list.
>             // This OMNode belongs to same OM service as the current OMNode.
>             // Add it to peerNodes list.
>             peerNodesList.add(getHAOMNodeDetails(conf, serviceId,
>                 nodeId, addr, ratisPort));
>           }
>         }
> {code}
> As a result I will have 3 running server but each has 1 one-node Ratis ring (peerNodesList is empty as only the local hostname can be resolved).
> Group ID is the same for all. But they have separated database and they work as separated OM which is VERY dangerous.
>  1. Option one: we can accept any unresolved address and retry with connection create if it couldn't be connected
> 2. Option two: at least the error handling should be fixed. When I configured 3 om, there supposed to be 3 om.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org