You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2022/05/14 13:49:00 UTC

[jira] [Resolved] (HDDS-5919) In kubernettes om HA has circular dependency on the service availability

     [ https://issues.apache.org/jira/browse/HDDS-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Doroszlai resolved HDDS-5919.
------------------------------------
    Fix Version/s: 1.3.0
       Resolution: Fixed

> In kubernettes om HA has circular dependency on the service availability
> ------------------------------------------------------------------------
>
>                 Key: HDDS-5919
>                 URL: https://issues.apache.org/jira/browse/HDDS-5919
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM
>    Affects Versions: 1.1.0
>            Reporter: Shawn
>            Assignee: Shawn
>            Priority: Critical
>              Labels: kubernetes, pull-request-available
>             Fix For: 1.3.0
>
>
> In Kubernettes, for OM HA, we need to specify each OM FQDN in the configuration. However, the OM address is in the form <om_pod_name>.<om_service_name>. During the OM initialization, OM needs to resolve the FQDN <om_pod_name>.<om_service_name>. But this FQDN can only be resolvable if the OM is in ready state (the OM service only includes the pods in ready states). It is kind of circular dependency.
>  
> My current hacking resolution is to replace the FQDN name with the local host name (om-0.omservice vs om-0) in ozone-site.xml config before the OM initialization. However, the side effect of this solution is that the recon component cannot be launched, because when recon look up the list of the om peers, the return list would be something like: om-0 (the leader), om-1.omservice, om-2.omservice, and the leader om-0 cannot be accessed.
> I feel the current ozone is more targeting to bare metal deployment (IPs do not change). We should take kubernettes environment, where the ip could be dynamic (node rescheduled, or whole app is redeployed for upgrading), into account.
> 2021-11-01 18:55:55 ERROR OzoneManagerServiceProviderImpl:315 - Unable to obtain Ozone Manager DB Snapshot.2021-11-01 18:55:55 ERROR OzoneManagerServiceProviderImpl:315 - Unable to obtain Ozone Manager DB Snapshot.java.net.UnknownHostException: Error while authenticating with endpoint: [http://test-ozone-om-uat-0:9874/dbCheckpoint] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:232) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:216) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:348) at org.apache.hadoop.hdfs.web.URLConnectionFactory.openConnection(URLConnectionFactory.java:186) at org.apache.hadoop.ozone.recon.ReconUtils.makeHttpCall(ReconUtils.java:237) at org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.lambda$getOzoneManagerDBSnapshot$1(OzoneManagerServiceProviderImpl.java:298) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) at org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:535) at org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:516) at org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.getOzoneManagerDBSnapshot(OzoneManagerServiceProviderImpl.java:297) at org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.updateReconOmDBWithNewSnapshot(OzoneManagerServiceProviderImpl.java:329) at org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.syncDataFromOM(OzoneManagerServiceProviderImpl.java:427) at org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.lambda$start$0(OzoneManagerServiceProviderImpl.java:233) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)Caused by: java.net.UnknownHostException: test-ozone-om-uat-0 at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220) at java.base/java.net.Socket.connect(Socket.java:609) at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177) at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474) at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569) at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:341) at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:362) at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1253) at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187) at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081) at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1015) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:189) ... 19 more



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org