You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/05/08 21:14:05 UTC

[GitHub] [ozone] neils-dev opened a new pull request, #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

neils-dev opened a new pull request, #3389:
URL: https://github.com/apache/ozone/pull/3389

   ## What changes were proposed in this pull request?
   
   OMFailoverProxyProvider is refactored into a base abstract class containing common properities and methods for ozone manager failover proxies and into two implemementing classes.  The base OMFailoverProxyProviderBase is extended and implemented for the standard ozone clients and for the Grpc ozone client, OMFailoverProxyProvider and GrpcOMFailoverProxyProvider respectively.
   
   This PR implements the refactoring as discussed in comments found in HDDS-5544 PR #2901.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6433
   
   ## How was this patch tested?
   
   This patch was tested through unit tests, integration tests and manually with docker-compose HA cluster.
   
   Unit tests:
   ```
   hadoop-ozone/s3gateway/src/test/java/org/apache/hadoop/ozone/protocolPB/TestGrpcOmTransport.java
   hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/protocolPB/TestS3GrpcOmTransport.java
   hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/ha/TestOMFailoverProxyProvider.java
   hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/failover/TestOMFailovers.java
   ```
   
   Integration tests: including
   `src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerHAMetadataOnly.java
   `
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime merged pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
kerneltime merged PR #3389:
URL: https://github.com/apache/ozone/pull/3389


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on a diff in pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
neils-dev commented on code in PR #3389:
URL: https://github.com/apache/ozone/pull/3389#discussion_r882220558


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java:
##########
@@ -68,70 +53,39 @@
  * multiple OMs to connect to. In case of OM failover, client can try
  * connecting to another OM node from the list of proxies.
  */
-public class OMFailoverProxyProvider<T> implements
-    FailoverProxyProvider<T>, Closeable {
+public class OMFailoverProxyProvider<T> extends

Review Comment:
   Sounds good.  We can rename it so we'll have the `OMFailoverProxyProviderBase` extended by `HadoopRpcOMFailoverProvider` and the `GrpcOMFailoverProvider`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on a diff in pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
kerneltime commented on code in PR #3389:
URL: https://github.com/apache/ozone/pull/3389#discussion_r880801593


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java:
##########
@@ -177,17 +181,22 @@ public void start() throws IOException {
   public OMResponse submitRequest(OMRequest payload) throws IOException {
     OMResponse resp = null;
     boolean tryOtherHost = true;
+    int expectedFailoverCount = 0;
     ResultCodes resultCode = ResultCodes.INTERNAL_ERROR;
     while (tryOtherHost) {
       tryOtherHost = false;
+      expectedFailoverCount = syncFailoverCount.get();

Review Comment:
   Ok, I need to go over this in detail, there seem to be 2 mechanisms in place, and would be nice to simplify the overall logic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on a diff in pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
neils-dev commented on code in PR #3389:
URL: https://github.com/apache/ozone/pull/3389#discussion_r870552755


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java:
##########
@@ -177,17 +181,22 @@ public void start() throws IOException {
   public OMResponse submitRequest(OMRequest payload) throws IOException {
     OMResponse resp = null;
     boolean tryOtherHost = true;
+    int expectedFailoverCount = 0;
     ResultCodes resultCode = ResultCodes.INTERNAL_ERROR;
     while (tryOtherHost) {
       tryOtherHost = false;
+      expectedFailoverCount = syncFailoverCount.get();

Review Comment:
   Thanks @kerneltime for reviewing this PR and for your comments.  The `expectedFailoverCount` that is re-synced to the `syncFailoverCount` prior to each` submitRequest` (over Grpc from s3g to om) and then checked against the `syncFailoverCount `in the `shouldRetry` is to guard against another thread also in failover incrementing the proxy index.
   
   This is to ensure `performFailover` (of OmFailoverProxyProviderBase) " _is supposed to called only once in a multithreaded enviornment.  This is where the failover occurs.  performFailOver updates the currentProxyOmNode.._."   
   Implemented here, the `performFailover` is only applied in failover through `shouldRety` iff `expectedFailoverCount` is in sync with the `syncFailoverCount` (`expectedFailoverCount` == `syncFailoverCount`).  Should the `expectedFailoverCount` in `shouldRetry` method not equal the `syncFailoverCount` at the point in failover, it implies that another thread is in failover and has incremented the proxy to retry with.  In this event the` performFailover` is **_NOT_** called, instead the proxy is updated to the current proxy index (which has been incremented by another thread in failover).
   
   Please do check if the behavior of the failover with `performFailover` (in `shouldRetry`) through the `expectedFailoverCount` and `syncFailoverCount` is as we want it to be under multi-threaded environment.  
    
   ```
             if (syncFailoverCount.get() == expectedFailoverCount) {
               omFailoverProxyProvider.performFailover(null);
               syncFailoverCount.getAndIncrement();
             } else {
               LOG.warn("A failover has occurred since the start of current" +
                   " thread retry, NOT failover using current proxy");
             }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev closed pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
neils-dev closed pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders
URL: https://github.com/apache/ozone/pull/3389


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on a diff in pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
kerneltime commented on code in PR #3389:
URL: https://github.com/apache/ozone/pull/3389#discussion_r880437888


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/GrpcOMFailoverProxyProvider.java:
##########
@@ -48,93 +54,105 @@
  * connecting to another OM node from the list of proxies.
  */
 public class GrpcOMFailoverProxyProvider<T> extends
-    OMFailoverProxyProvider<T> {
-
-  private Map<String, String> omAddresses;
+    OMFailoverProxyProviderBase<T> {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(GrpcOMFailoverProxyProvider.class);
 
   public GrpcOMFailoverProxyProvider(ConfigurationSource configuration,
-                                     UserGroupInformation ugi,
                                      String omServiceId,
                                      Class<T> protocol) throws IOException {
-    super(configuration, ugi, omServiceId, protocol);
+    super(configuration, omServiceId, protocol);
   }
 
   @Override
   protected void loadOMClientConfigs(ConfigurationSource config, String omSvcId)
       throws IOException {
-    // to be used for base class omProxies,
-    // ProxyInfo not applicable for gRPC, just need key set
-    Map<String, ProxyInfo<T>> omProxiesNodeIdKeyset = new HashMap<>();
-    // to be used for base class omProxyInfos
-    // OMProxyInfo not applicable for gRPC, just need key set
-    Map<String, OMProxyInfo> omProxyInfosNodeIdKeyset = new HashMap<>();
-    List<String> omNodeIDList = new ArrayList<>();
-    omAddresses = new HashMap<>();
 
     Collection<String> omNodeIds = OmUtils.getActiveOMNodeIds(config, omSvcId);
+    Map<String, ProxyInfo<T>> omProxies = new HashMap<>();
+    List<String> omNodeIDList = new ArrayList<>();
 
     for (String nodeId : OmUtils.emptyAsSingletonNull(omNodeIds)) {
-
       String rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY,
           omSvcId, nodeId);
-
       Optional<String> hostaddr = getHostNameFromConfigKeys(config,
           rpcAddrKey);
-
       OptionalInt hostport = HddsUtils.getNumberFromConfigKeys(config,
           ConfUtils.addKeySuffixes(OMConfigKeys.OZONE_OM_GRPC_PORT_KEY,
               omSvcId, nodeId),
           OMConfigKeys.OZONE_OM_GRPC_PORT_KEY);
       if (nodeId == null) {
         nodeId = OzoneConsts.OM_DEFAULT_NODE_ID;
       }
-      omProxiesNodeIdKeyset.put(nodeId, null);
-      omProxyInfosNodeIdKeyset.put(nodeId, null);
       if (hostaddr.isPresent()) {
-        omAddresses.put(nodeId,
-            hostaddr.get() + ":"
-                + hostport.orElse(config
-                .getObject(GrpcOmTransport
-                    .GrpcOmTransportConfig.class)
-                .getPort()));
+        ProxyInfo<T> proxyInfo =
+            new ProxyInfo<>(createOMProxy(),
+                hostaddr.get() + ":"
+                    + hostport.orElse(config
+                    .getObject(GrpcOmTransport
+                        .GrpcOmTransportConfig.class)
+                    .getPort()));
+        omProxies.put(nodeId, proxyInfo);
       } else {
         LOG.error("expected host address not defined for: {}", rpcAddrKey);
         throw new ConfigurationException(rpcAddrKey + "is not defined");
       }
       omNodeIDList.add(nodeId);
     }
 
-    if (omProxiesNodeIdKeyset.isEmpty()) {
+    if (omProxies.isEmpty()) {
       throw new IllegalArgumentException("Could not find any configured " +
           "addresses for OM. Please configure the system with "
           + OZONE_OM_ADDRESS_KEY);
     }
+    setOmProxies(omProxies);
+    setOmNodeIDList(omNodeIDList);
+  }
 
-    // set base class omProxies, omProxyInfos, omNodeIDList
+  private T createOMProxy() throws IOException {
+    InetSocketAddress addr = new InetSocketAddress(0);
+    Configuration hadoopConf =
+        LegacyHadoopConfigurationSource.asHadoopConfiguration(getConf());
+    return (T) RPC.getProxy(getInterface(), 0, addr, hadoopConf);
+  }
 
-    // omProxies needed in base class
-    // omProxies.size == number of om nodes
-    // omProxies key needs to be valid nodeid
-    // omProxyInfos keyset needed in base class
-    setProxies(omProxiesNodeIdKeyset, omProxyInfosNodeIdKeyset, omNodeIDList);
+  /**
+   * Get the proxy object which should be used until the next failover event
+   * occurs. RPC proxy object is intialized lazily.
+   * @return the OM proxy object to invoke methods upon
+   */
+  @Override
+  public synchronized ProxyInfo<T> getProxy() {
+    return getOMProxyMap().get(getCurrentProxyOMNodeId());
   }
 
   @Override
-  protected Text computeDelegationTokenService() {
-    return new Text();
+  protected synchronized boolean shouldFailover(Exception ex) {
+    if (ex instanceof StatusRuntimeException) {
+      StatusRuntimeException srexp = (StatusRuntimeException)ex;
+      Status status = srexp.getStatus();
+      if (status.getCode() == Status.Code.RESOURCE_EXHAUSTED) {

Review Comment:
   is this the only exception for which we should not failover? I see `FAILED_PRECONDITION`, `INVALID_ARGUMENT` and a few others in the list of possible codes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on a diff in pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
kerneltime commented on code in PR #3389:
URL: https://github.com/apache/ozone/pull/3389#discussion_r880439603


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java:
##########
@@ -68,70 +53,39 @@
  * multiple OMs to connect to. In case of OM failover, client can try
  * connecting to another OM node from the list of proxies.
  */
-public class OMFailoverProxyProvider<T> implements
-    FailoverProxyProvider<T>, Closeable {
+public class OMFailoverProxyProvider<T> extends

Review Comment:
   We should probably rename this to `HadoopRpcOMFailoverProvider`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on a diff in pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
kerneltime commented on code in PR #3389:
URL: https://github.com/apache/ozone/pull/3389#discussion_r868217813


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java:
##########
@@ -177,17 +181,22 @@ public void start() throws IOException {
   public OMResponse submitRequest(OMRequest payload) throws IOException {
     OMResponse resp = null;
     boolean tryOtherHost = true;
+    int expectedFailoverCount = 0;
     ResultCodes resultCode = ResultCodes.INTERNAL_ERROR;
     while (tryOtherHost) {
       tryOtherHost = false;
+      expectedFailoverCount = syncFailoverCount.get();

Review Comment:
   A quick eye balling of this code is a bit confusing, `expectedFailoverCount` gets set to `syncFailoverCount` here and then passed to `shouldRetry` which then checks it against `syncFailoverCount`



##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java:
##########
@@ -177,17 +181,22 @@ public void start() throws IOException {
   public OMResponse submitRequest(OMRequest payload) throws IOException {
     OMResponse resp = null;
     boolean tryOtherHost = true;
+    int expectedFailoverCount = 0;
     ResultCodes resultCode = ResultCodes.INTERNAL_ERROR;
     while (tryOtherHost) {
       tryOtherHost = false;
+      expectedFailoverCount = syncFailoverCount.get();

Review Comment:
   A quick eyeballing of this code is a bit confusing, `expectedFailoverCount` gets set to `syncFailoverCount` here and then passed to `shouldRetry` which then checks it against `syncFailoverCount`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on a diff in pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
kerneltime commented on code in PR #3389:
URL: https://github.com/apache/ozone/pull/3389#discussion_r868216194


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java:
##########
@@ -177,17 +181,22 @@ public void start() throws IOException {
   public OMResponse submitRequest(OMRequest payload) throws IOException {
     OMResponse resp = null;
     boolean tryOtherHost = true;
+    int expectedFailoverCount = 0;
     ResultCodes resultCode = ResultCodes.INTERNAL_ERROR;
     while (tryOtherHost) {
       tryOtherHost = false;
+      expectedFailoverCount = syncFailoverCount.get();
       try {
         resp = clients.get(host.get()).submitRequest(payload);
       } catch (StatusRuntimeException e) {
         if (e.getStatus().getCode() == Status.Code.UNAVAILABLE) {
           resultCode = ResultCodes.TIMEOUT;
         }
         Exception exp = new Exception(e);
-        tryOtherHost = shouldRetry(unwrapException(exp));
+        /*e.getStatus().getCode() == Status.Code.

Review Comment:
   Should this be deleted?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on a diff in pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
neils-dev commented on code in PR #3389:
URL: https://github.com/apache/ozone/pull/3389#discussion_r886197608


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/GrpcOMFailoverProxyProvider.java:
##########
@@ -48,93 +54,105 @@
  * connecting to another OM node from the list of proxies.
  */
 public class GrpcOMFailoverProxyProvider<T> extends
-    OMFailoverProxyProvider<T> {
-
-  private Map<String, String> omAddresses;
+    OMFailoverProxyProviderBase<T> {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(GrpcOMFailoverProxyProvider.class);
 
   public GrpcOMFailoverProxyProvider(ConfigurationSource configuration,
-                                     UserGroupInformation ugi,
                                      String omServiceId,
                                      Class<T> protocol) throws IOException {
-    super(configuration, ugi, omServiceId, protocol);
+    super(configuration, omServiceId, protocol);
   }
 
   @Override
   protected void loadOMClientConfigs(ConfigurationSource config, String omSvcId)
       throws IOException {
-    // to be used for base class omProxies,
-    // ProxyInfo not applicable for gRPC, just need key set
-    Map<String, ProxyInfo<T>> omProxiesNodeIdKeyset = new HashMap<>();
-    // to be used for base class omProxyInfos
-    // OMProxyInfo not applicable for gRPC, just need key set
-    Map<String, OMProxyInfo> omProxyInfosNodeIdKeyset = new HashMap<>();
-    List<String> omNodeIDList = new ArrayList<>();
-    omAddresses = new HashMap<>();
 
     Collection<String> omNodeIds = OmUtils.getActiveOMNodeIds(config, omSvcId);
+    Map<String, ProxyInfo<T>> omProxies = new HashMap<>();
+    List<String> omNodeIDList = new ArrayList<>();
 
     for (String nodeId : OmUtils.emptyAsSingletonNull(omNodeIds)) {
-
       String rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY,
           omSvcId, nodeId);
-
       Optional<String> hostaddr = getHostNameFromConfigKeys(config,
           rpcAddrKey);
-
       OptionalInt hostport = HddsUtils.getNumberFromConfigKeys(config,
           ConfUtils.addKeySuffixes(OMConfigKeys.OZONE_OM_GRPC_PORT_KEY,
               omSvcId, nodeId),
           OMConfigKeys.OZONE_OM_GRPC_PORT_KEY);
       if (nodeId == null) {
         nodeId = OzoneConsts.OM_DEFAULT_NODE_ID;
       }
-      omProxiesNodeIdKeyset.put(nodeId, null);
-      omProxyInfosNodeIdKeyset.put(nodeId, null);
       if (hostaddr.isPresent()) {
-        omAddresses.put(nodeId,
-            hostaddr.get() + ":"
-                + hostport.orElse(config
-                .getObject(GrpcOmTransport
-                    .GrpcOmTransportConfig.class)
-                .getPort()));
+        ProxyInfo<T> proxyInfo =
+            new ProxyInfo<>(createOMProxy(),
+                hostaddr.get() + ":"
+                    + hostport.orElse(config
+                    .getObject(GrpcOmTransport
+                        .GrpcOmTransportConfig.class)
+                    .getPort()));
+        omProxies.put(nodeId, proxyInfo);
       } else {
         LOG.error("expected host address not defined for: {}", rpcAddrKey);
         throw new ConfigurationException(rpcAddrKey + "is not defined");
       }
       omNodeIDList.add(nodeId);
     }
 
-    if (omProxiesNodeIdKeyset.isEmpty()) {
+    if (omProxies.isEmpty()) {
       throw new IllegalArgumentException("Could not find any configured " +
           "addresses for OM. Please configure the system with "
           + OZONE_OM_ADDRESS_KEY);
     }
+    setOmProxies(omProxies);
+    setOmNodeIDList(omNodeIDList);
+  }
 
-    // set base class omProxies, omProxyInfos, omNodeIDList
+  private T createOMProxy() throws IOException {
+    InetSocketAddress addr = new InetSocketAddress(0);
+    Configuration hadoopConf =
+        LegacyHadoopConfigurationSource.asHadoopConfiguration(getConf());
+    return (T) RPC.getProxy(getInterface(), 0, addr, hadoopConf);
+  }
 
-    // omProxies needed in base class
-    // omProxies.size == number of om nodes
-    // omProxies key needs to be valid nodeid
-    // omProxyInfos keyset needed in base class
-    setProxies(omProxiesNodeIdKeyset, omProxyInfosNodeIdKeyset, omNodeIDList);
+  /**
+   * Get the proxy object which should be used until the next failover event
+   * occurs. RPC proxy object is intialized lazily.
+   * @return the OM proxy object to invoke methods upon
+   */
+  @Override
+  public synchronized ProxyInfo<T> getProxy() {
+    return getOMProxyMap().get(getCurrentProxyOMNodeId());
   }
 
   @Override
-  protected Text computeDelegationTokenService() {
-    return new Text();
+  protected synchronized boolean shouldFailover(Exception ex) {
+    if (ex instanceof StatusRuntimeException) {
+      StatusRuntimeException srexp = (StatusRuntimeException)ex;
+      Status status = srexp.getStatus();
+      if (status.getCode() == Status.Code.RESOURCE_EXHAUSTED) {

Review Comment:
   Thanks @kerneltime for bringing up gRPC errors which we should not failover and retry.  From the list of possible codes, including `FAILED_PRECONDITION` and `INVALID_ARGUMENT,` for the most part they are a result of application level errors that are thrown and handled by the business logic.  In our case, for errors at the application layer, these are thrown as OMExceptions that are sent back to the caller through OMResponses.  Because of this errors such as `FAILED_PRECONDITION` and `PERMISSION_ERRORS` are not thrown and handled by the transport, rather they are thrown and handled by the business logic through an error OMResponse.
   
   From the list of possible codes, I am adding `DATA_LOSS` in addition to `RESOURCE_EXHAUSTED` as errors we do not perform failover.  See list of possible codes and client handling for each possibility in shared doc https://docs.google.com/document/d/1EXepGLluPxQfX7dJE6nR6hSYk1wQz4YzVnwZ6xOe8Xo/edit?usp=sharing.
   
   Let me know if we should include additional codes that the client should NOT failover.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on a diff in pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
neils-dev commented on code in PR #3389:
URL: https://github.com/apache/ozone/pull/3389#discussion_r886197608


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/GrpcOMFailoverProxyProvider.java:
##########
@@ -48,93 +54,105 @@
  * connecting to another OM node from the list of proxies.
  */
 public class GrpcOMFailoverProxyProvider<T> extends
-    OMFailoverProxyProvider<T> {
-
-  private Map<String, String> omAddresses;
+    OMFailoverProxyProviderBase<T> {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(GrpcOMFailoverProxyProvider.class);
 
   public GrpcOMFailoverProxyProvider(ConfigurationSource configuration,
-                                     UserGroupInformation ugi,
                                      String omServiceId,
                                      Class<T> protocol) throws IOException {
-    super(configuration, ugi, omServiceId, protocol);
+    super(configuration, omServiceId, protocol);
   }
 
   @Override
   protected void loadOMClientConfigs(ConfigurationSource config, String omSvcId)
       throws IOException {
-    // to be used for base class omProxies,
-    // ProxyInfo not applicable for gRPC, just need key set
-    Map<String, ProxyInfo<T>> omProxiesNodeIdKeyset = new HashMap<>();
-    // to be used for base class omProxyInfos
-    // OMProxyInfo not applicable for gRPC, just need key set
-    Map<String, OMProxyInfo> omProxyInfosNodeIdKeyset = new HashMap<>();
-    List<String> omNodeIDList = new ArrayList<>();
-    omAddresses = new HashMap<>();
 
     Collection<String> omNodeIds = OmUtils.getActiveOMNodeIds(config, omSvcId);
+    Map<String, ProxyInfo<T>> omProxies = new HashMap<>();
+    List<String> omNodeIDList = new ArrayList<>();
 
     for (String nodeId : OmUtils.emptyAsSingletonNull(omNodeIds)) {
-
       String rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY,
           omSvcId, nodeId);
-
       Optional<String> hostaddr = getHostNameFromConfigKeys(config,
           rpcAddrKey);
-
       OptionalInt hostport = HddsUtils.getNumberFromConfigKeys(config,
           ConfUtils.addKeySuffixes(OMConfigKeys.OZONE_OM_GRPC_PORT_KEY,
               omSvcId, nodeId),
           OMConfigKeys.OZONE_OM_GRPC_PORT_KEY);
       if (nodeId == null) {
         nodeId = OzoneConsts.OM_DEFAULT_NODE_ID;
       }
-      omProxiesNodeIdKeyset.put(nodeId, null);
-      omProxyInfosNodeIdKeyset.put(nodeId, null);
       if (hostaddr.isPresent()) {
-        omAddresses.put(nodeId,
-            hostaddr.get() + ":"
-                + hostport.orElse(config
-                .getObject(GrpcOmTransport
-                    .GrpcOmTransportConfig.class)
-                .getPort()));
+        ProxyInfo<T> proxyInfo =
+            new ProxyInfo<>(createOMProxy(),
+                hostaddr.get() + ":"
+                    + hostport.orElse(config
+                    .getObject(GrpcOmTransport
+                        .GrpcOmTransportConfig.class)
+                    .getPort()));
+        omProxies.put(nodeId, proxyInfo);
       } else {
         LOG.error("expected host address not defined for: {}", rpcAddrKey);
         throw new ConfigurationException(rpcAddrKey + "is not defined");
       }
       omNodeIDList.add(nodeId);
     }
 
-    if (omProxiesNodeIdKeyset.isEmpty()) {
+    if (omProxies.isEmpty()) {
       throw new IllegalArgumentException("Could not find any configured " +
           "addresses for OM. Please configure the system with "
           + OZONE_OM_ADDRESS_KEY);
     }
+    setOmProxies(omProxies);
+    setOmNodeIDList(omNodeIDList);
+  }
 
-    // set base class omProxies, omProxyInfos, omNodeIDList
+  private T createOMProxy() throws IOException {
+    InetSocketAddress addr = new InetSocketAddress(0);
+    Configuration hadoopConf =
+        LegacyHadoopConfigurationSource.asHadoopConfiguration(getConf());
+    return (T) RPC.getProxy(getInterface(), 0, addr, hadoopConf);
+  }
 
-    // omProxies needed in base class
-    // omProxies.size == number of om nodes
-    // omProxies key needs to be valid nodeid
-    // omProxyInfos keyset needed in base class
-    setProxies(omProxiesNodeIdKeyset, omProxyInfosNodeIdKeyset, omNodeIDList);
+  /**
+   * Get the proxy object which should be used until the next failover event
+   * occurs. RPC proxy object is intialized lazily.
+   * @return the OM proxy object to invoke methods upon
+   */
+  @Override
+  public synchronized ProxyInfo<T> getProxy() {
+    return getOMProxyMap().get(getCurrentProxyOMNodeId());
   }
 
   @Override
-  protected Text computeDelegationTokenService() {
-    return new Text();
+  protected synchronized boolean shouldFailover(Exception ex) {
+    if (ex instanceof StatusRuntimeException) {
+      StatusRuntimeException srexp = (StatusRuntimeException)ex;
+      Status status = srexp.getStatus();
+      if (status.getCode() == Status.Code.RESOURCE_EXHAUSTED) {

Review Comment:
   Thanks @kerneltime for bringing up gRPC errors which we should not failover and retry.  From the list of possible codes, including `FAILED_PRECONDITION` and `INVALID_ARGUMENT,` for the most part they are a result of application level errors that are thrown and handled by the business logic.  In our case, for errors at the application layer, these are thrown as OMExceptions that are sent back to the caller through OMResponses.  Because of this errors such as `FAILED_PRECONDITION` and `PERMISSION_ERRORS` are not thrown and handled by the transport, rather they are thrown and handled by the business logic through an error OMResponse.
   
   From the list of possible codes, I am adding `DATA_LOSS` in addition to `RESOURCE_EXHAUSTED` as errors we do not perform failover.  See list of possible codes and client handling for each possibility in shared doc 
   https://docs.google.com/document/d/1EXepGLluPxQfX7dJE6nR6hSYk1wQz4YzVnwZ6xOe8Xo/edit?usp=sharing.
   
   Let me know if we should include additional codes that the client should NOT failover.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] kerneltime commented on pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
kerneltime commented on PR #3389:
URL: https://github.com/apache/ozone/pull/3389#issuecomment-1148217832

   @neils-dev I did go over this today partially, the google doc attached should be part of the code documentation. I also want to go over the failover logic locally once. Will complete this tomorrow.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] neils-dev commented on a diff in pull request #3389: HDDS-6433. Refactor OMFailoverProxyProvider to provide a base for OM and GrpcOM FailoverProxyProviders

Posted by GitBox <gi...@apache.org>.
neils-dev commented on code in PR #3389:
URL: https://github.com/apache/ozone/pull/3389#discussion_r870553784


##########
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/GrpcOmTransport.java:
##########
@@ -177,17 +181,22 @@ public void start() throws IOException {
   public OMResponse submitRequest(OMRequest payload) throws IOException {
     OMResponse resp = null;
     boolean tryOtherHost = true;
+    int expectedFailoverCount = 0;
     ResultCodes resultCode = ResultCodes.INTERNAL_ERROR;
     while (tryOtherHost) {
       tryOtherHost = false;
+      expectedFailoverCount = syncFailoverCount.get();
       try {
         resp = clients.get(host.get()).submitRequest(payload);
       } catch (StatusRuntimeException e) {
         if (e.getStatus().getCode() == Status.Code.UNAVAILABLE) {
           resultCode = ResultCodes.TIMEOUT;
         }
         Exception exp = new Exception(e);
-        tryOtherHost = shouldRetry(unwrapException(exp));
+        /*e.getStatus().getCode() == Status.Code.

Review Comment:
   Yes.  Left-over thought thus commented out.  I'll remove it.  Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org