You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/09/02 06:04:58 UTC

[GitHub] [hadoop-ozone] timmylicheng opened a new pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

timmylicheng opened a new pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340


   ## What changes were proposed in this pull request?
   
   (Please fill in changes proposed in this fix)
   
   ## What is the link to the Apache JIRA
   
   (Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HDDS-XXXX. Fix a typo in YYY.)
   
   Please replace this section with the link to the Apache JIRA)
   
   ## How was this patch tested?
   
   (Please explain how this patch was tested. Ex: unit tests, manual tests)
   (If this patch involves UI changes, please attach a screen-shot; otherwise, remove this)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#issuecomment-690835337


   > The client failover logic is based on the suggested leader sent by SCM. The `String` value of suggested leader sent by SCM Server is `RaftPeer#getAddress`, but at client side this value is compare with `SCM_DUMMY_NODEID_PREFIX + <id>` which will never match. So suggested leader is never valued and we are always failing over to the next proxy in round robin.
   
   I have removed suggestedLeader related parts in this PR until we think it thru. Thanks for the head out.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r486736129



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+import static org.apache.hadoop.hdds.HddsUtils.getHostName;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)

Review comment:
       Updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] nandakumar131 commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
nandakumar131 commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r485108126



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+import static org.apache.hadoop.hdds.HddsUtils.getHostName;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)

Review comment:
       `conf#getObject` is a heavy call, we can avoid calling it twice.

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+import static org.apache.hadoop.hdds.HddsUtils.getHostName;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  @VisibleForTesting
+  protected Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (!scmAddressList.isEmpty()) {
+      final int port = getPortNumberFromConfigKeys(conf,
+          ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_ADDRESS_KEY)
+          .orElse(ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_PORT_DEFAULT);
+      for (String scmAddress : scmAddressList) {
+        LOG.info("SCM Address for proxy is {}", scmAddress);
+
+        Optional<String> hostname = getHostName(scmAddress);
+        if (hostname.isPresent()) {
+          resultList.add(NetUtils.createSocketAddr(
+              hostname.get() + ":" + port));
+        }
+      }
+    }
+    if (resultList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    }
+    return resultList;
+  }
+
+  private void loadConfigs() {
+    Collection<InetSocketAddress> scmAddressList = getSCMAddressList();
+    int scmNodeIndex = 1;
+    for (InetSocketAddress scmAddress : scmAddressList) {
+      String nodeId = SCM_DUMMY_NODEID_PREFIX + scmNodeIndex;
+      if (scmAddress == null) {
+        LOG.error("Failed to create SCM proxy for {}.", nodeId);
+        continue;
+      }
+      scmNodeIndex++;
+      SCMProxyInfo scmProxyInfo = new SCMProxyInfo(
+          scmServiceId, nodeId, scmAddress);
+      ProxyInfo<ScmBlockLocationProtocolPB> proxy = new ProxyInfo<>(
+          null, scmProxyInfo.toString());
+      scmProxies.put(nodeId, proxy);

Review comment:
       Can we avoid putting `null` proxy?
   While reading, instead of checking if `proxyInfo.proxy == null`, we can do `proxyInfo#contains`.

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+import static org.apache.hadoop.hdds.HddsUtils.getHostName;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  @VisibleForTesting
+  protected Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (!scmAddressList.isEmpty()) {
+      final int port = getPortNumberFromConfigKeys(conf,
+          ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_ADDRESS_KEY)
+          .orElse(ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_PORT_DEFAULT);
+      for (String scmAddress : scmAddressList) {
+        LOG.info("SCM Address for proxy is {}", scmAddress);
+
+        Optional<String> hostname = getHostName(scmAddress);
+        if (hostname.isPresent()) {
+          resultList.add(NetUtils.createSocketAddr(
+              hostname.get() + ":" + port));
+        }
+      }
+    }
+    if (resultList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    }
+    return resultList;
+  }
+
+  private void loadConfigs() {
+    Collection<InetSocketAddress> scmAddressList = getSCMAddressList();
+    int scmNodeIndex = 1;
+    for (InetSocketAddress scmAddress : scmAddressList) {
+      String nodeId = SCM_DUMMY_NODEID_PREFIX + scmNodeIndex;
+      if (scmAddress == null) {
+        LOG.error("Failed to create SCM proxy for {}.", nodeId);
+        continue;
+      }
+      scmNodeIndex++;
+      SCMProxyInfo scmProxyInfo = new SCMProxyInfo(
+          scmServiceId, nodeId, scmAddress);
+      ProxyInfo<ScmBlockLocationProtocolPB> proxy = new ProxyInfo<>(
+          null, scmProxyInfo.toString());
+      scmProxies.put(nodeId, proxy);
+      scmProxyInfoMap.put(nodeId, scmProxyInfo);
+      scmNodeIDList.add(nodeId);
+    }
+
+    if (scmProxies.isEmpty()) {
+      throw new IllegalArgumentException("Could not find any configured " +
+          "addresses for SCM. Please configure the system with "
+          + OZONE_SCM_NAMES);
+    }
+  }
+
+  @VisibleForTesting
+  public synchronized String getCurrentProxyOMNodeId() {
+    return currentProxySCMNodeId;
+  }
+
+  @Override
+  public synchronized ProxyInfo getProxy() {
+    ProxyInfo currentProxyInfo = scmProxies.get(currentProxySCMNodeId);
+    createSCMProxyIfNeeded(currentProxyInfo, currentProxySCMNodeId);
+    return currentProxyInfo;
+  }
+
+  @Override
+  public void performFailover(ScmBlockLocationProtocolPB newLeader) {
+    // Should do nothing here.

Review comment:
       What is the reason for not doing anything here?

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+import static org.apache.hadoop.hdds.HddsUtils.getHostName;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  @VisibleForTesting
+  protected Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (!scmAddressList.isEmpty()) {
+      final int port = getPortNumberFromConfigKeys(conf,
+          ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_ADDRESS_KEY)
+          .orElse(ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_PORT_DEFAULT);
+      for (String scmAddress : scmAddressList) {
+        LOG.info("SCM Address for proxy is {}", scmAddress);
+
+        Optional<String> hostname = getHostName(scmAddress);
+        if (hostname.isPresent()) {
+          resultList.add(NetUtils.createSocketAddr(
+              hostname.get() + ":" + port));
+        }
+      }
+    }
+    if (resultList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    }
+    return resultList;
+  }
+
+  private void loadConfigs() {
+    Collection<InetSocketAddress> scmAddressList = getSCMAddressList();
+    int scmNodeIndex = 1;
+    for (InetSocketAddress scmAddress : scmAddressList) {
+      String nodeId = SCM_DUMMY_NODEID_PREFIX + scmNodeIndex;
+      if (scmAddress == null) {
+        LOG.error("Failed to create SCM proxy for {}.", nodeId);
+        continue;
+      }
+      scmNodeIndex++;
+      SCMProxyInfo scmProxyInfo = new SCMProxyInfo(
+          scmServiceId, nodeId, scmAddress);
+      ProxyInfo<ScmBlockLocationProtocolPB> proxy = new ProxyInfo<>(
+          null, scmProxyInfo.toString());
+      scmProxies.put(nodeId, proxy);
+      scmProxyInfoMap.put(nodeId, scmProxyInfo);
+      scmNodeIDList.add(nodeId);
+    }
+
+    if (scmProxies.isEmpty()) {
+      throw new IllegalArgumentException("Could not find any configured " +
+          "addresses for SCM. Please configure the system with "
+          + OZONE_SCM_NAMES);
+    }
+  }
+
+  @VisibleForTesting
+  public synchronized String getCurrentProxyOMNodeId() {
+    return currentProxySCMNodeId;
+  }
+
+  @Override
+  public synchronized ProxyInfo getProxy() {
+    ProxyInfo currentProxyInfo = scmProxies.get(currentProxySCMNodeId);
+    createSCMProxyIfNeeded(currentProxyInfo, currentProxySCMNodeId);
+    return currentProxyInfo;
+  }
+
+  @Override
+  public void performFailover(ScmBlockLocationProtocolPB newLeader) {
+    // Should do nothing here.
+    LOG.debug("Failing over to next proxy. {}", getCurrentProxyOMNodeId());
+  }
+
+  public void performFailoverToAssignedLeader(String newLeader) {
+    if (newLeader == null) {
+      // If newLeader is not assigned, it will fail over to next proxy.
+      nextProxyIndex();
+    } else {
+      if (!assignLeaderToNode(newLeader)) {
+        LOG.debug("Failing over OM proxy to nodeId: {}", newLeader);
+        nextProxyIndex();
+      }
+    }
+  }
+
+  @Override
+  public Class<ScmBlockLocationProtocolPB> getInterface() {
+    return ScmBlockLocationProtocolPB.class;
+  }
+
+  @Override
+  public synchronized void close() throws IOException {
+    for (ProxyInfo<ScmBlockLocationProtocolPB> proxy : scmProxies.values()) {
+      ScmBlockLocationProtocolPB scmProxy = proxy.proxy;
+      if (scmProxy != null) {
+        RPC.stopProxy(scmProxy);
+      }
+    }
+  }
+
+  public RetryAction getRetryAction(int failovers) {
+    if (failovers < maxRetryCount) {
+      return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,
+          getRetryInterval());
+    } else {
+      return RetryAction.FAIL;
+    }
+  }
+
+  private synchronized long getRetryInterval() {
+    // TODO add exponential backup
+    return retryInterval;
+  }
+
+  private synchronized int nextProxyIndex() {
+    lastAttemptedLeader = currentProxySCMNodeId;
+
+    // round robin the next proxy
+    currentProxyIndex = (currentProxyIndex + 1) % scmProxies.size();
+    currentProxySCMNodeId =  scmNodeIDList.get(currentProxyIndex);
+    return currentProxyIndex;
+  }
+
+  synchronized boolean assignLeaderToNode(String newLeaderNodeId) {

Review comment:
       This can be made `private`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] amaliujia commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r480720338



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  @VisibleForTesting
+  protected Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (scmAddressList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    } else {
+      for (String scmAddress : scmAddressList) {
+        LOG.info("SCM Address for proxy is {}", scmAddress);
+        int indexOfComma = scmAddress.lastIndexOf(":");

Review comment:
       The default implementation of `getTrimmedStringCollection` seems does not validate the format of ip address in string: https://github.com/apache/hadoop-ozone/blob/34ee8311b0d0a37878fe1fd2e5d8c1b91aa8cc8f/hadoop-hdds/config/src/main/java/org/apache/hadoop/hdds/conf/ConfigurationSource.java#L99
   
   Is the format is already validated somewhere else? if not this implementation will be fragile.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r486736129



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+import static org.apache.hadoop.hdds.HddsUtils.getHostName;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)

Review comment:
       Updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#issuecomment-690835337


   > The client failover logic is based on the suggested leader sent by SCM. The `String` value of suggested leader sent by SCM Server is `RaftPeer#getAddress`, but at client side this value is compare with `SCM_DUMMY_NODEID_PREFIX + <id>` which will never match. So suggested leader is never valued and we are always failing over to the next proxy in round robin.
   
   I have removed suggestedLeader related parts in this PR until we think it thru. Thanks for the head out.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#issuecomment-699594942


   > Thanks @timmylicheng for working on this. The patch LGTM overall, a few comments added inline.
   
   @xiaoyuyao Thanks for the reviews! I've updated the PR. Please take another look.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r486199846



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+import static org.apache.hadoop.hdds.HddsUtils.getHostName;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  @VisibleForTesting
+  protected Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (!scmAddressList.isEmpty()) {
+      final int port = getPortNumberFromConfigKeys(conf,
+          ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_ADDRESS_KEY)
+          .orElse(ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_PORT_DEFAULT);
+      for (String scmAddress : scmAddressList) {
+        LOG.info("SCM Address for proxy is {}", scmAddress);
+
+        Optional<String> hostname = getHostName(scmAddress);
+        if (hostname.isPresent()) {
+          resultList.add(NetUtils.createSocketAddr(
+              hostname.get() + ":" + port));
+        }
+      }
+    }
+    if (resultList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    }
+    return resultList;
+  }
+
+  private void loadConfigs() {
+    Collection<InetSocketAddress> scmAddressList = getSCMAddressList();
+    int scmNodeIndex = 1;
+    for (InetSocketAddress scmAddress : scmAddressList) {
+      String nodeId = SCM_DUMMY_NODEID_PREFIX + scmNodeIndex;
+      if (scmAddress == null) {
+        LOG.error("Failed to create SCM proxy for {}.", nodeId);
+        continue;
+      }
+      scmNodeIndex++;
+      SCMProxyInfo scmProxyInfo = new SCMProxyInfo(
+          scmServiceId, nodeId, scmAddress);
+      ProxyInfo<ScmBlockLocationProtocolPB> proxy = new ProxyInfo<>(
+          null, scmProxyInfo.toString());
+      scmProxies.put(nodeId, proxy);
+      scmProxyInfoMap.put(nodeId, scmProxyInfo);
+      scmNodeIDList.add(nodeId);
+    }
+
+    if (scmProxies.isEmpty()) {
+      throw new IllegalArgumentException("Could not find any configured " +
+          "addresses for SCM. Please configure the system with "
+          + OZONE_SCM_NAMES);
+    }
+  }
+
+  @VisibleForTesting
+  public synchronized String getCurrentProxyOMNodeId() {
+    return currentProxySCMNodeId;
+  }
+
+  @Override
+  public synchronized ProxyInfo getProxy() {
+    ProxyInfo currentProxyInfo = scmProxies.get(currentProxySCMNodeId);
+    createSCMProxyIfNeeded(currentProxyInfo, currentProxySCMNodeId);
+    return currentProxyInfo;
+  }
+
+  @Override
+  public void performFailover(ScmBlockLocationProtocolPB newLeader) {
+    // Should do nothing here.
+    LOG.debug("Failing over to next proxy. {}", getCurrentProxyOMNodeId());
+  }
+
+  public void performFailoverToAssignedLeader(String newLeader) {
+    if (newLeader == null) {
+      // If newLeader is not assigned, it will fail over to next proxy.
+      nextProxyIndex();
+    } else {
+      if (!assignLeaderToNode(newLeader)) {
+        LOG.debug("Failing over OM proxy to nodeId: {}", newLeader);
+        nextProxyIndex();
+      }
+    }
+  }
+
+  @Override
+  public Class<ScmBlockLocationProtocolPB> getInterface() {
+    return ScmBlockLocationProtocolPB.class;
+  }
+
+  @Override
+  public synchronized void close() throws IOException {
+    for (ProxyInfo<ScmBlockLocationProtocolPB> proxy : scmProxies.values()) {
+      ScmBlockLocationProtocolPB scmProxy = proxy.proxy;
+      if (scmProxy != null) {
+        RPC.stopProxy(scmProxy);
+      }
+    }
+  }
+
+  public RetryAction getRetryAction(int failovers) {
+    if (failovers < maxRetryCount) {
+      return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,
+          getRetryInterval());
+    } else {
+      return RetryAction.FAIL;
+    }
+  }
+
+  private synchronized long getRetryInterval() {
+    // TODO add exponential backup
+    return retryInterval;
+  }
+
+  private synchronized int nextProxyIndex() {
+    lastAttemptedLeader = currentProxySCMNodeId;
+
+    // round robin the next proxy
+    currentProxyIndex = (currentProxyIndex + 1) % scmProxies.size();
+    currentProxySCMNodeId =  scmNodeIDList.get(currentProxyIndex);
+    return currentProxyIndex;
+  }
+
+  synchronized boolean assignLeaderToNode(String newLeaderNodeId) {

Review comment:
       Updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng closed pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng closed pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#issuecomment-681728622


   Thanks @GlenGeng 's review. I've addressed all comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r492295789



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")
+public class SCMClientConfig {
+  public static final String SCM_CLIENT_RPC_TIME_OUT = "rpc.timeout";
+  public static final String SCM_CLIENT_FAILOVER_MAX_RETRY =
+      "failover.max.retry";
+  public static final String SCM_CLIENT_RETRY_INTERVAL =
+      "failover.retry.interval";
+
+  @Config(key = SCM_CLIENT_RPC_TIME_OUT,
+      defaultValue = "15m",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +
+          "If ipc.client.ping is set to true and this rpc-timeout " +
+          "is greater than the value of ipc.ping.interval, the effective " +
+          "value of the rpc-timeout is rounded up to multiple of " +
+          "ipc.ping.interval."
+  )
+  private long rpcTimeOut = 15 * 60 * 1000;
+
+  @Config(key = SCM_CLIENT_FAILOVER_MAX_RETRY,
+      defaultValue = "15",
+      type = ConfigType.INT,
+      tags = {OZONE, SCM, CLIENT},
+      description = "Max retry count for SCM Client when failover happens."
+  )
+  private int retryCount = 15;
+
+  @Config(key = SCM_CLIENT_RETRY_INTERVAL,
+      defaultValue = "2s",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +

Review comment:
       Also the description does not match with the configuration key and needs to be updated. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r492295311



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")
+public class SCMClientConfig {
+  public static final String SCM_CLIENT_RPC_TIME_OUT = "rpc.timeout";
+  public static final String SCM_CLIENT_FAILOVER_MAX_RETRY =
+      "failover.max.retry";
+  public static final String SCM_CLIENT_RETRY_INTERVAL =
+      "failover.retry.interval";
+
+  @Config(key = SCM_CLIENT_RPC_TIME_OUT,
+      defaultValue = "15m",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +
+          "If ipc.client.ping is set to true and this rpc-timeout " +
+          "is greater than the value of ipc.ping.interval, the effective " +
+          "value of the rpc-timeout is rounded up to multiple of " +
+          "ipc.ping.interval."
+  )
+  private long rpcTimeOut = 15 * 60 * 1000;
+
+  @Config(key = SCM_CLIENT_FAILOVER_MAX_RETRY,
+      defaultValue = "15",
+      type = ConfigType.INT,
+      tags = {OZONE, SCM, CLIENT},
+      description = "Max retry count for SCM Client when failover happens."
+  )
+  private int retryCount = 15;
+
+  @Config(key = SCM_CLIENT_RETRY_INTERVAL,
+      defaultValue = "2s",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +

Review comment:
       OzoneManager=>StorageContainerManger
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng closed pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng closed pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r495537521



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMProxyInfo.java
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.base.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.InetSocketAddress;
+
+/**
+ * Class to store SCM proxy info.
+ */
+public class SCMProxyInfo {
+  private String serviceId;
+  private String nodeId;
+  private String rpcAddrStr;
+  private InetSocketAddress rpcAddr;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(SCMProxyInfo.class);
+
+  public SCMProxyInfo(String serviceID, String nodeID,
+                      InetSocketAddress rpcAddress) {
+    Preconditions.checkNotNull(rpcAddress);
+    this.serviceId = serviceID;
+    this.nodeId = nodeID;
+    this.rpcAddrStr = rpcAddress.toString();
+    this.rpcAddr = rpcAddress;
+    if (rpcAddr.isUnresolved()) {
+      LOG.warn("SCM address {} for serviceID {} remains unresolved " +
+              "for node ID {} Check your ozone-site.xml file to ensure scm " +
+              "addresses are configured properly.",
+          rpcAddress, serviceId, nodeId);
+    }
+  }
+
+  public String toString() {
+    return new StringBuilder()
+        .append("nodeId=")
+        .append(nodeId)
+        .append(",nodeAddress=")
+        .append(rpcAddrStr).toString();
+  }
+
+  public InetSocketAddress getAddress() {

Review comment:
       Updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r495537270



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")
+public class SCMClientConfig {
+  public static final String SCM_CLIENT_RPC_TIME_OUT = "rpc.timeout";
+  public static final String SCM_CLIENT_FAILOVER_MAX_RETRY =
+      "failover.max.retry";
+  public static final String SCM_CLIENT_RETRY_INTERVAL =
+      "failover.retry.interval";
+
+  @Config(key = SCM_CLIENT_RPC_TIME_OUT,
+      defaultValue = "15m",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +
+          "If ipc.client.ping is set to true and this rpc-timeout " +
+          "is greater than the value of ipc.ping.interval, the effective " +
+          "value of the rpc-timeout is rounded up to multiple of " +
+          "ipc.ping.interval."
+  )
+  private long rpcTimeOut = 15 * 60 * 1000;
+
+  @Config(key = SCM_CLIENT_FAILOVER_MAX_RETRY,
+      defaultValue = "15",
+      type = ConfigType.INT,
+      tags = {OZONE, SCM, CLIENT},
+      description = "Max retry count for SCM Client when failover happens."
+  )
+  private int retryCount = 15;
+
+  @Config(key = SCM_CLIENT_RETRY_INTERVAL,
+      defaultValue = "2s",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +

Review comment:
       Updated

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")
+public class SCMClientConfig {
+  public static final String SCM_CLIENT_RPC_TIME_OUT = "rpc.timeout";
+  public static final String SCM_CLIENT_FAILOVER_MAX_RETRY =
+      "failover.max.retry";
+  public static final String SCM_CLIENT_RETRY_INTERVAL =
+      "failover.retry.interval";
+
+  @Config(key = SCM_CLIENT_RPC_TIME_OUT,
+      defaultValue = "15m",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +

Review comment:
       Updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r486199021



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+import static org.apache.hadoop.hdds.HddsUtils.getHostName;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  @VisibleForTesting
+  protected Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (!scmAddressList.isEmpty()) {
+      final int port = getPortNumberFromConfigKeys(conf,
+          ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_ADDRESS_KEY)
+          .orElse(ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_PORT_DEFAULT);
+      for (String scmAddress : scmAddressList) {
+        LOG.info("SCM Address for proxy is {}", scmAddress);
+
+        Optional<String> hostname = getHostName(scmAddress);
+        if (hostname.isPresent()) {
+          resultList.add(NetUtils.createSocketAddr(
+              hostname.get() + ":" + port));
+        }
+      }
+    }
+    if (resultList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    }
+    return resultList;
+  }
+
+  private void loadConfigs() {
+    Collection<InetSocketAddress> scmAddressList = getSCMAddressList();
+    int scmNodeIndex = 1;
+    for (InetSocketAddress scmAddress : scmAddressList) {
+      String nodeId = SCM_DUMMY_NODEID_PREFIX + scmNodeIndex;
+      if (scmAddress == null) {
+        LOG.error("Failed to create SCM proxy for {}.", nodeId);
+        continue;
+      }
+      scmNodeIndex++;
+      SCMProxyInfo scmProxyInfo = new SCMProxyInfo(
+          scmServiceId, nodeId, scmAddress);
+      ProxyInfo<ScmBlockLocationProtocolPB> proxy = new ProxyInfo<>(
+          null, scmProxyInfo.toString());
+      scmProxies.put(nodeId, proxy);

Review comment:
       ProxyInfo is defined in FailoverProxyProvider. Interface is defined already.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r495537306



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")

Review comment:
       Updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] GlenGeng commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r478287118



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1027,6 +1027,25 @@ public ReplicationManager getReplicationManager() {
     return replicationManager;
   }
 
+  /**
+   * Check if the current scm is the leader.
+   * @return - if the current scm is the leader.
+   */
+  public boolean checkLeader() {
+    return scmHAManager.isLeader();
+  }
+
+  /**
+   * Get suggested leader from Raft.
+   * @return - suggested leader address.
+   */
+  public String getSuggestedLeader() {
+    if (scmHAManager.getSuggestedLeader() != null) {

Review comment:
       I guess it will be a NPE ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#issuecomment-690835337


   > The client failover logic is based on the suggested leader sent by SCM. The `String` value of suggested leader sent by SCM Server is `RaftPeer#getAddress`, but at client side this value is compare with `SCM_DUMMY_NODEID_PREFIX + <id>` which will never match. So suggested leader is never valued and we are always failing over to the next proxy in round robin.
   
   I have removed suggestedLeader related parts in this PR until we think it thru. Thanks for the head out.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r492298148



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/ScmBlockLocationProtocolServerSideTranslatorPB.java
##########
@@ -94,9 +95,26 @@ public ScmBlockLocationProtocolServerSideTranslatorPB(
         .setTraceID(traceID);
   }
 
+  private boolean isLeader() throws ServiceException {
+    if (!(impl instanceof SCMBlockProtocolServer)) {
+      throw new ServiceException("Should be SCMBlockProtocolServer");
+    } else {
+      return ((SCMBlockProtocolServer) impl).getScm().checkLeader();
+    }
+  }
+
   @Override
   public SCMBlockLocationResponse send(RpcController controller,
       SCMBlockLocationRequest request) throws ServiceException {
+    if (!isLeader()) {
+      SCMBlockLocationResponse.Builder response = createSCMBlockResponse(
+          request.getCmdType(),
+          request.getTraceID());
+      response.setSuccess(false);
+      response.setStatus(Status.SCM_NOT_LEADER);
+      response.setLeaderSCMNodeId(null);

Review comment:
       Should we send back the leader known by this SCM follower instead of null to avoid unnecessary client retry?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r492294696



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")
+public class SCMClientConfig {
+  public static final String SCM_CLIENT_RPC_TIME_OUT = "rpc.timeout";
+  public static final String SCM_CLIENT_FAILOVER_MAX_RETRY =
+      "failover.max.retry";
+  public static final String SCM_CLIENT_RETRY_INTERVAL =
+      "failover.retry.interval";
+
+  @Config(key = SCM_CLIENT_RPC_TIME_OUT,
+      defaultValue = "15m",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +

Review comment:
       OzoneManager=>StorageContainerManger




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#issuecomment-701500271


   Thanks @timmylicheng  for the update. LGTM, +1. I will merge it shortly. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] GlenGeng commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r477137491



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMProxyInfo.java
##########
@@ -0,0 +1,64 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.base.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.InetSocketAddress;
+
+/**
+ * Class to store SCM proxy info.
+ */
+public class SCMProxyInfo {
+  private String serviceId;
+  private String nodeId;
+  private String rpcAddrStr;
+  private InetSocketAddress rpcAddr;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(SCMProxyInfo.class);
+
+  SCMProxyInfo(String serviceID, String nodeID, InetSocketAddress rpcAddress) {

Review comment:
       miss access modifier, such as public




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r492294434



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")

Review comment:
       should this configuration group prefix with hdds.scmclient to match with the package name?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r481759992



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  @VisibleForTesting
+  protected Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (scmAddressList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    } else {
+      for (String scmAddress : scmAddressList) {
+        LOG.info("SCM Address for proxy is {}", scmAddress);
+        int indexOfComma = scmAddress.lastIndexOf(":");

Review comment:
       Add a validator




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r481759798



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/ScmBlockLocationProtocolServerSideTranslatorPB.java
##########
@@ -94,9 +95,33 @@ public ScmBlockLocationProtocolServerSideTranslatorPB(
         .setTraceID(traceID);
   }
 
+  private boolean isLeader() throws ServiceException {
+    if (!(impl instanceof SCMBlockProtocolServer)) {
+      throw new ServiceException("Should be SCMBlockProtocolServer");

Review comment:
       Updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng closed pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng closed pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r478289422



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,278 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  @VisibleForTesting
+  protected Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (scmAddressList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    } else {
+      for (String scmAddress : scmAddressList) {
+        LOG.info("SCM Address for proxy is {}", scmAddress);
+        if (scmAddress.contains(":")) {
+          resultList.add(NetUtils.createSocketAddr(scmAddress));
+        } else {
+          final int port = getPortNumberFromConfigKeys(conf,
+              ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_ADDRESS_KEY)
+              .orElse(ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_PORT_DEFAULT);
+          resultList.add(NetUtils.createSocketAddr(scmAddress + ":" + port));
+        }
+      }
+    }
+    return resultList;
+  }
+
+  private void loadConfigs() {
+    Collection<InetSocketAddress> scmAddressList = getSCMAddressList();
+    int scmNodeIndex = 1;
+    for (InetSocketAddress scmAddress : scmAddressList) {
+      String nodeId = SCM_DUMMY_NODEID_PREFIX + scmNodeIndex;
+      if (scmAddress == null) {
+        LOG.error("Failed to create SCM proxy for {}.", nodeId);
+        continue;
+      }
+      scmNodeIndex++;
+      SCMProxyInfo scmProxyInfo = new SCMProxyInfo(
+          scmServiceId, nodeId, scmAddress);
+      ProxyInfo<ScmBlockLocationProtocolPB> proxy = new ProxyInfo<>(
+          null, scmProxyInfo.toString());
+      scmProxies.put(nodeId, proxy);
+      scmProxyInfoMap.put(nodeId, scmProxyInfo);
+      scmNodeIDList.add(nodeId);
+    }
+
+    if (scmProxies.isEmpty()) {
+      throw new IllegalArgumentException("Could not find any configured " +
+          "addresses for SCM. Please configure the system with "
+          + OZONE_SCM_NAMES);
+    }
+  }
+
+  @VisibleForTesting
+  public synchronized String getCurrentProxyOMNodeId() {
+    return currentProxySCMNodeId;
+  }
+
+  @Override
+  public synchronized ProxyInfo getProxy() {
+    ProxyInfo currentProxyInfo = scmProxies.get(currentProxySCMNodeId);
+    createSCMProxyIfNeeded(currentProxyInfo, currentProxySCMNodeId);
+    return currentProxyInfo;
+  }
+
+  @Override
+  public void performFailover(ScmBlockLocationProtocolPB newLeader) {
+    // By default, will round robin to next proxy.
+    nextProxyIndex();

Review comment:
       Sure. Thanks for pointing out.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r486736129



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+import static org.apache.hadoop.hdds.HddsUtils.getHostName;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)

Review comment:
       Updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng closed pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng closed pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r485337010



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+import static org.apache.hadoop.hdds.HddsUtils.getHostName;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  @VisibleForTesting
+  protected Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (!scmAddressList.isEmpty()) {
+      final int port = getPortNumberFromConfigKeys(conf,
+          ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_ADDRESS_KEY)
+          .orElse(ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_PORT_DEFAULT);
+      for (String scmAddress : scmAddressList) {
+        LOG.info("SCM Address for proxy is {}", scmAddress);
+
+        Optional<String> hostname = getHostName(scmAddress);
+        if (hostname.isPresent()) {
+          resultList.add(NetUtils.createSocketAddr(
+              hostname.get() + ":" + port));
+        }
+      }
+    }
+    if (resultList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    }
+    return resultList;
+  }
+
+  private void loadConfigs() {
+    Collection<InetSocketAddress> scmAddressList = getSCMAddressList();
+    int scmNodeIndex = 1;
+    for (InetSocketAddress scmAddress : scmAddressList) {
+      String nodeId = SCM_DUMMY_NODEID_PREFIX + scmNodeIndex;
+      if (scmAddress == null) {
+        LOG.error("Failed to create SCM proxy for {}.", nodeId);
+        continue;
+      }
+      scmNodeIndex++;
+      SCMProxyInfo scmProxyInfo = new SCMProxyInfo(
+          scmServiceId, nodeId, scmAddress);
+      ProxyInfo<ScmBlockLocationProtocolPB> proxy = new ProxyInfo<>(
+          null, scmProxyInfo.toString());
+      scmProxies.put(nodeId, proxy);
+      scmProxyInfoMap.put(nodeId, scmProxyInfo);
+      scmNodeIDList.add(nodeId);
+    }
+
+    if (scmProxies.isEmpty()) {
+      throw new IllegalArgumentException("Could not find any configured " +
+          "addresses for SCM. Please configure the system with "
+          + OZONE_SCM_NAMES);
+    }
+  }
+
+  @VisibleForTesting
+  public synchronized String getCurrentProxyOMNodeId() {
+    return currentProxySCMNodeId;
+  }
+
+  @Override
+  public synchronized ProxyInfo getProxy() {
+    ProxyInfo currentProxyInfo = scmProxies.get(currentProxySCMNodeId);
+    createSCMProxyIfNeeded(currentProxyInfo, currentProxySCMNodeId);
+    return currentProxyInfo;
+  }
+
+  @Override
+  public void performFailover(ScmBlockLocationProtocolPB newLeader) {
+    // Should do nothing here.

Review comment:
       The input protocol ScmBlockLocationProtocolPB is difficult to use here. We mainly use performFailoverToAssignedLeader




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r492296581



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMProxyInfo.java
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.base.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.InetSocketAddress;
+
+/**
+ * Class to store SCM proxy info.
+ */
+public class SCMProxyInfo {
+  private String serviceId;
+  private String nodeId;
+  private String rpcAddrStr;
+  private InetSocketAddress rpcAddr;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(SCMProxyInfo.class);
+
+  public SCMProxyInfo(String serviceID, String nodeID,
+                      InetSocketAddress rpcAddress) {
+    Preconditions.checkNotNull(rpcAddress);
+    this.serviceId = serviceID;
+    this.nodeId = nodeID;
+    this.rpcAddrStr = rpcAddress.toString();
+    this.rpcAddr = rpcAddress;
+    if (rpcAddr.isUnresolved()) {
+      LOG.warn("SCM address {} for serviceID {} remains unresolved " +
+              "for node ID {} Check your ozone-site.xml file to ensure scm " +
+              "addresses are configured properly.",
+          rpcAddress, serviceId, nodeId);
+    }
+  }
+
+  public String toString() {
+    return new StringBuilder()
+        .append("nodeId=")
+        .append(nodeId)
+        .append(",nodeAddress=")
+        .append(rpcAddrStr).toString();
+  }
+
+  public InetSocketAddress getAddress() {

Review comment:
       missing getter for nodeid and serviceid?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng closed pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng closed pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] GlenGeng commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r478286088



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,278 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  @VisibleForTesting
+  protected Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (scmAddressList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    } else {
+      for (String scmAddress : scmAddressList) {
+        LOG.info("SCM Address for proxy is {}", scmAddress);
+        if (scmAddress.contains(":")) {
+          resultList.add(NetUtils.createSocketAddr(scmAddress));
+        } else {
+          final int port = getPortNumberFromConfigKeys(conf,
+              ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_ADDRESS_KEY)
+              .orElse(ScmConfigKeys.OZONE_SCM_BLOCK_CLIENT_PORT_DEFAULT);
+          resultList.add(NetUtils.createSocketAddr(scmAddress + ":" + port));
+        }
+      }
+    }
+    return resultList;
+  }
+
+  private void loadConfigs() {
+    Collection<InetSocketAddress> scmAddressList = getSCMAddressList();
+    int scmNodeIndex = 1;
+    for (InetSocketAddress scmAddress : scmAddressList) {
+      String nodeId = SCM_DUMMY_NODEID_PREFIX + scmNodeIndex;
+      if (scmAddress == null) {
+        LOG.error("Failed to create SCM proxy for {}.", nodeId);
+        continue;
+      }
+      scmNodeIndex++;
+      SCMProxyInfo scmProxyInfo = new SCMProxyInfo(
+          scmServiceId, nodeId, scmAddress);
+      ProxyInfo<ScmBlockLocationProtocolPB> proxy = new ProxyInfo<>(
+          null, scmProxyInfo.toString());
+      scmProxies.put(nodeId, proxy);
+      scmProxyInfoMap.put(nodeId, scmProxyInfo);
+      scmNodeIDList.add(nodeId);
+    }
+
+    if (scmProxies.isEmpty()) {
+      throw new IllegalArgumentException("Could not find any configured " +
+          "addresses for SCM. Please configure the system with "
+          + OZONE_SCM_NAMES);
+    }
+  }
+
+  @VisibleForTesting
+  public synchronized String getCurrentProxyOMNodeId() {
+    return currentProxySCMNodeId;
+  }
+
+  @Override
+  public synchronized ProxyInfo getProxy() {
+    ProxyInfo currentProxyInfo = scmProxies.get(currentProxySCMNodeId);
+    createSCMProxyIfNeeded(currentProxyInfo, currentProxySCMNodeId);
+    return currentProxyInfo;
+  }
+
+  @Override
+  public void performFailover(ScmBlockLocationProtocolPB newLeader) {
+    // By default, will round robin to next proxy.
+    nextProxyIndex();

Review comment:
       no need to call `nextProxyIndex()`.
   Since the real failover is done by `performFailoverToAssignedLeader()`, it is called by `RetryPolicy.shouldRetry()` and `submitRequest()`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] amaliujia commented on pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#issuecomment-685271344


   This PR overall LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] GlenGeng commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r477189151



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1053,6 +1053,22 @@ public ReplicationManager getReplicationManager() {
     return replicationManager;
   }
 
+  /**
+   * Check if the current scm is the leader.
+   * @return - if the current scm is the leader.
+   */
+  public boolean checkLeader() {
+    return scmHAManager.isLeader();
+  }
+
+  /**
+   * Get suggested leader from Raft.
+   * @return - suggested leader address.
+   */
+  public String getSuggestedLeader() {
+    return scmHAManager.getSuggestedLeader().getAddress();

Review comment:
       may be NPE ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng closed pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng closed pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] timmylicheng commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
timmylicheng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r478289278



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1027,6 +1027,25 @@ public ReplicationManager getReplicationManager() {
     return replicationManager;
   }
 
+  /**
+   * Check if the current scm is the leader.
+   * @return - if the current scm is the leader.
+   */
+  public boolean checkLeader() {
+    return scmHAManager.isLeader();
+  }
+
+  /**
+   * Get suggested leader from Raft.
+   * @return - suggested leader address.
+   */
+  public String getSuggestedLeader() {
+    if (scmHAManager.getSuggestedLeader() != null) {

Review comment:
       Updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] GlenGeng commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r476376037



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,279 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+  private ScmBlockLocationProtocol currentProxy;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  private Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (scmAddressList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    } else {
+      for (String scmAddress : scmAddressList) {
+        resultList.add(NetUtils.createSocketAddr(scmAddress));
+      }
+    }
+    return resultList;
+  }
+
+  private void loadConfigs() {
+    Collection<InetSocketAddress> scmAddressList = getSCMAddressList();
+    int scmNodeIndex = 1;
+    for (InetSocketAddress scmAddress : scmAddressList) {
+      String nodeId = SCM_DUMMY_NODEID_PREFIX + scmNodeIndex;
+      if (scmAddress == null) {
+        LOG.error("Failed to create SCM proxy for {}.", nodeId);
+        continue;
+      }
+      scmNodeIndex++;
+      SCMProxyInfo scmProxyInfo = new SCMProxyInfo(
+          scmServiceId, nodeId, scmAddress);
+      ProxyInfo<ScmBlockLocationProtocolPB> proxy = new ProxyInfo<>(
+          null, scmProxyInfo.toString());
+      scmProxies.put(nodeId, proxy);
+      scmProxyInfoMap.put(nodeId, scmProxyInfo);
+      scmNodeIDList.add(nodeId);
+    }
+
+    if (scmProxies.isEmpty()) {
+      throw new IllegalArgumentException("Could not find any configured " +
+          "addresses for SCM. Please configure the system with "
+          + OZONE_SCM_NAMES);
+    }
+  }
+
+  @VisibleForTesting
+  public synchronized String getCurrentProxyOMNodeId() {
+    return currentProxySCMNodeId;
+  }
+
+  @Override
+  public synchronized ProxyInfo getProxy() {
+    ProxyInfo currentProxyInfo = scmProxies.get(currentProxySCMNodeId);
+    createSCMProxyIfNeeded(currentProxyInfo, currentProxySCMNodeId);
+    return currentProxyInfo;
+  }
+
+  public ScmBlockLocationProtocolPB getCurrentProxy() {

Review comment:
       I guess this method is not needed.

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/ScmBlockLocationProtocolClientSideTranslatorPB.java
##########
@@ -73,15 +75,19 @@
   private static final RpcController NULL_RPC_CONTROLLER = null;
 
   private final ScmBlockLocationProtocolPB rpcProxy;
+  private SCMBlockLocationFailoverProxyProvider failoverProxyProvider;
 
   /**
    * Creates a new StorageContainerLocationProtocolClientSideTranslatorPB.
    *
-   * @param rpcProxy {@link StorageContainerLocationProtocolPB} RPC proxy
+   * @param proxyProvider {@link SCMBlockLocationFailoverProxyProvider}
+   * failover proxy provider.
    */
   public ScmBlockLocationProtocolClientSideTranslatorPB(
-      ScmBlockLocationProtocolPB rpcProxy) {
-    this.rpcProxy = rpcProxy;
+      SCMBlockLocationFailoverProxyProvider proxyProvider) {
+    Preconditions.checkState(proxyProvider != null);
+    this.failoverProxyProvider = proxyProvider;
+    this.rpcProxy = proxyProvider.getCurrentProxy();

Review comment:
       refer to OM-HA, the `rpcProxy` should be a result of `Proxy.newProxyInstance(...`. which relies on `FailoverProxyProvider` to switch between different `ScmBlockLocationProtocolService`.
   
   ```
       OzoneManagerProtocolPB proxy = (OzoneManagerProtocolPB) RetryProxy.create(
           OzoneManagerProtocolPB.class, failoverProxyProvider, retryPolicy);
       return proxy;
   ```

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,279 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+  private ScmBlockLocationProtocol currentProxy;
+
+  private final ConfigurationSource conf;
+  private final long scmVersion;
+
+  private final String scmServiceId;
+
+  private String lastAttemptedLeader;
+
+  private final int maxRetryCount;
+  private final long retryInterval;
+
+  public static final String SCM_DUMMY_NODEID_PREFIX = "scm";
+
+  public SCMBlockLocationFailoverProxyProvider(ConfigurationSource conf) {
+    this.conf = conf;
+    this.scmVersion = RPC.getProtocolVersion(ScmBlockLocationProtocol.class);
+    this.scmServiceId = conf.getTrimmed(OZONE_SCM_SERVICE_IDS_KEY);
+    this.scmProxies = new HashMap<>();
+    this.scmProxyInfoMap = new HashMap<>();
+    this.scmNodeIDList = new ArrayList<>();
+    loadConfigs();
+
+    this.currentProxyIndex = 0;
+    currentProxySCMNodeId = scmNodeIDList.get(currentProxyIndex);
+
+    this.maxRetryCount = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryCount();
+    this.retryInterval = conf.getObject(SCMBlockClientConfig.class)
+        .getRetryInterval();
+  }
+
+  private Collection<InetSocketAddress> getSCMAddressList() {
+    Collection<String> scmAddressList =
+        conf.getTrimmedStringCollection(OZONE_SCM_NAMES);
+    Collection<InetSocketAddress> resultList = new ArrayList<>();
+    if (scmAddressList.isEmpty()) {
+      // fall back
+      resultList.add(getScmAddressForBlockClients(conf));
+    } else {
+      for (String scmAddress : scmAddressList) {
+        resultList.add(NetUtils.createSocketAddr(scmAddress));
+      }
+    }
+    return resultList;
+  }
+
+  private void loadConfigs() {
+    Collection<InetSocketAddress> scmAddressList = getSCMAddressList();
+    int scmNodeIndex = 1;
+    for (InetSocketAddress scmAddress : scmAddressList) {
+      String nodeId = SCM_DUMMY_NODEID_PREFIX + scmNodeIndex;
+      if (scmAddress == null) {
+        LOG.error("Failed to create SCM proxy for {}.", nodeId);
+        continue;
+      }
+      scmNodeIndex++;
+      SCMProxyInfo scmProxyInfo = new SCMProxyInfo(
+          scmServiceId, nodeId, scmAddress);
+      ProxyInfo<ScmBlockLocationProtocolPB> proxy = new ProxyInfo<>(
+          null, scmProxyInfo.toString());
+      scmProxies.put(nodeId, proxy);
+      scmProxyInfoMap.put(nodeId, scmProxyInfo);
+      scmNodeIDList.add(nodeId);
+    }
+
+    if (scmProxies.isEmpty()) {
+      throw new IllegalArgumentException("Could not find any configured " +
+          "addresses for SCM. Please configure the system with "
+          + OZONE_SCM_NAMES);
+    }
+  }
+
+  @VisibleForTesting
+  public synchronized String getCurrentProxyOMNodeId() {
+    return currentProxySCMNodeId;
+  }
+
+  @Override
+  public synchronized ProxyInfo getProxy() {
+    ProxyInfo currentProxyInfo = scmProxies.get(currentProxySCMNodeId);
+    createSCMProxyIfNeeded(currentProxyInfo, currentProxySCMNodeId);
+    return currentProxyInfo;
+  }
+
+  public ScmBlockLocationProtocolPB getCurrentProxy() {
+    return (ScmBlockLocationProtocolPB) getProxy().proxy;
+  }
+
+  @Override
+  public void performFailover(ScmBlockLocationProtocolPB newLeader) {
+    // By default, will round robin to next proxy.
+    nextProxyIndex();
+    LOG.debug("Failing over to next proxy. {}", getCurrentProxyOMNodeId());
+  }
+
+  public void performFailoverToAssignedLeader(String newLeader) {
+    if (newLeader == null) {
+      // If newLeader is not assigned, it will fail over to next proxy.
+      nextProxyIndex();
+    } else {
+      if (!assignLeaderToNode(newLeader)) {
+        LOG.debug("Failing over OM proxy to nodeId: {}", newLeader);
+        nextProxyIndex();
+      }
+    }
+  }
+
+  @Override
+  public Class<ScmBlockLocationProtocolPB> getInterface() {
+    return ScmBlockLocationProtocolPB.class;
+  }
+
+  @Override
+  public synchronized void close() throws IOException {
+    for (ProxyInfo<ScmBlockLocationProtocolPB> proxy : scmProxies.values()) {
+      ScmBlockLocationProtocolPB scmProxy = proxy.proxy;
+      if (scmProxy != null) {
+        RPC.stopProxy(scmProxy);
+      }
+    }
+  }
+
+  public RetryAction getRetryAction(int failovers) {
+    if (failovers < maxRetryCount) {
+      return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,
+          getRetryInterval());
+    } else {
+      return RetryAction.FAIL;
+    }
+  }
+
+  private synchronized long getRetryInterval() {
+    // TODO add exponential backup
+    return retryInterval;
+  }
+
+  private synchronized int nextProxyIndex() {
+    lastAttemptedLeader = currentProxySCMNodeId;
+
+    // round robin the next proxy
+    currentProxyIndex = (currentProxyIndex + 1) % scmProxies.size();
+    currentProxySCMNodeId =  scmNodeIDList.get(currentProxyIndex);
+    return currentProxyIndex;
+  }
+
+  synchronized boolean assignLeaderToNode(String newLeaderNodeId) {
+    if (!currentProxySCMNodeId.equals(newLeaderNodeId)) {
+      if (scmProxies.containsKey(newLeaderNodeId)) {
+        lastAttemptedLeader = currentProxySCMNodeId;
+        currentProxySCMNodeId = newLeaderNodeId;
+        currentProxyIndex = scmNodeIDList.indexOf(currentProxySCMNodeId);
+        return true;
+      }
+    } else {
+      lastAttemptedLeader = currentProxySCMNodeId;
+    }
+    return false;
+  }
+
+  /**
+   * Creates proxy object if it does not already exist.
+   */
+  private void createSCMProxyIfNeeded(ProxyInfo proxyInfo,
+                                     String nodeId) {
+    if (proxyInfo.proxy == null) {
+      InetSocketAddress address = scmProxyInfoMap.get(nodeId).getAddress();
+      try {
+        ScmBlockLocationProtocolPB proxy = createSCMProxy(address);
+        try {
+          proxyInfo.proxy = proxy;
+        } catch (IllegalAccessError iae) {
+          scmProxies.put(nodeId,
+              new ProxyInfo<>(proxy, proxyInfo.proxyInfo));
+        }
+      } catch (IOException ioe) {
+        LOG.error("{} Failed to create RPC proxy to SCM at {}",
+            this.getClass().getSimpleName(), address, ioe);
+        throw new RuntimeException(ioe);
+      }
+    }
+  }
+
+  private ScmBlockLocationProtocolPB createSCMProxy(
+      InetSocketAddress scmAddress) throws IOException {
+    Configuration hadoopConf =
+        LegacyHadoopConfigurationSource.asHadoopConfiguration(conf);
+    RPC.setProtocolEngine(hadoopConf, ScmBlockLocationProtocol.class,
+        ProtobufRpcEngine.class);
+    return RPC.getProxy(ScmBlockLocationProtocolPB.class, scmVersion,
+        scmAddress, UserGroupInformation.getCurrentUser(), hadoopConf,
+        NetUtils.getDefaultSocketFactory(hadoopConf),
+        (int)conf.getObject(SCMBlockClientConfig.class).getRpcTimeOut());
+  }
+
+  public RetryPolicy getSCMBlockLocationRetryPolicy(
+      String suggestedLeader) {
+    RetryPolicy retryPolicy = new RetryPolicy() {
+      @Override
+      public RetryAction shouldRetry(Exception e, int retry,
+                                     int failover, boolean b) {
+        if (suggestedLeader == null) {
+          performFailover(null);

Review comment:
       Should not call `performFailover()` in `FailoverProxyProvider`, it is job of `RetryInvocationHandler` to call `performFailover()` in its `RetryInvocationHandler.ProxyDescriptor.failover()`

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -0,0 +1,279 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocol;
+import org.apache.hadoop.hdds.scm.protocolPB.ScmBlockLocationProtocolPB;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.FailoverProxyProvider;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryPolicy.RetryAction;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_NAMES;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_SERVICE_IDS_KEY;
+import static org.apache.hadoop.hdds.HddsUtils.getScmAddressForBlockClients;
+import static org.apache.hadoop.hdds.HddsUtils.getPortNumberFromConfigKeys;
+
+/**
+ * Failover proxy provider for SCM.
+ */
+public class SCMBlockLocationFailoverProxyProvider implements
+    FailoverProxyProvider<ScmBlockLocationProtocolPB>, Closeable {
+  public static final Logger LOG =
+      LoggerFactory.getLogger(SCMBlockLocationFailoverProxyProvider.class);
+
+  private Map<String, ProxyInfo<ScmBlockLocationProtocolPB>> scmProxies;
+  private Map<String, SCMProxyInfo> scmProxyInfoMap;
+  private List<String> scmNodeIDList;
+
+  private String currentProxySCMNodeId;
+  private int currentProxyIndex;
+  private ScmBlockLocationProtocol currentProxy;

Review comment:
       This is not used.

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/ScmBlockLocationProtocolClientSideTranslatorPB.java
##########
@@ -105,6 +111,11 @@ private SCMBlockLocationResponse submitRequest(
     try {
       SCMBlockLocationResponse response =
           rpcProxy.send(NULL_RPC_CONTROLLER, req);
+      if (response.getStatus() ==

Review comment:
       We need consider the case that current follower SCM does not know current leader.
   
   check this
   ```
     public OMResponse submitRequest(OMRequest payload) throws IOException {
       try {
         OMResponse omResponse =
             rpcProxy.submitRequest(NULL_RPC_CONTROLLER, payload);
   
         if (omResponse.hasLeaderOMNodeId() && omFailoverProxyProvider != null) {
           String leaderOmId = omResponse.getLeaderOMNodeId();
   
           // Failover to the OM node returned by OMResponse leaderOMNodeId if
           // current proxy is not pointing to that node.
           omFailoverProxyProvider.performFailoverIfRequired(leaderOmId);
         }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao merged pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao merged pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1340: HDDS-3188 Add failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r492294434



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")

Review comment:
       should this configuration group prefix with hdds.scmclient to match with the package name?

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")
+public class SCMClientConfig {
+  public static final String SCM_CLIENT_RPC_TIME_OUT = "rpc.timeout";
+  public static final String SCM_CLIENT_FAILOVER_MAX_RETRY =
+      "failover.max.retry";
+  public static final String SCM_CLIENT_RETRY_INTERVAL =
+      "failover.retry.interval";
+
+  @Config(key = SCM_CLIENT_RPC_TIME_OUT,
+      defaultValue = "15m",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +

Review comment:
       OzoneManager=>StorageContainerManger

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")
+public class SCMClientConfig {
+  public static final String SCM_CLIENT_RPC_TIME_OUT = "rpc.timeout";
+  public static final String SCM_CLIENT_FAILOVER_MAX_RETRY =
+      "failover.max.retry";
+  public static final String SCM_CLIENT_RETRY_INTERVAL =
+      "failover.retry.interval";
+
+  @Config(key = SCM_CLIENT_RPC_TIME_OUT,
+      defaultValue = "15m",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +
+          "If ipc.client.ping is set to true and this rpc-timeout " +
+          "is greater than the value of ipc.ping.interval, the effective " +
+          "value of the rpc-timeout is rounded up to multiple of " +
+          "ipc.ping.interval."
+  )
+  private long rpcTimeOut = 15 * 60 * 1000;
+
+  @Config(key = SCM_CLIENT_FAILOVER_MAX_RETRY,
+      defaultValue = "15",
+      type = ConfigType.INT,
+      tags = {OZONE, SCM, CLIENT},
+      description = "Max retry count for SCM Client when failover happens."
+  )
+  private int retryCount = 15;
+
+  @Config(key = SCM_CLIENT_RETRY_INTERVAL,
+      defaultValue = "2s",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +

Review comment:
       OzoneManager=>StorageContainerManger
   
   

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import org.apache.hadoop.hdds.conf.Config;
+import org.apache.hadoop.hdds.conf.ConfigGroup;
+import org.apache.hadoop.hdds.conf.ConfigType;
+
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.hadoop.hdds.conf.ConfigTag.CLIENT;
+import static org.apache.hadoop.hdds.conf.ConfigTag.OZONE;
+import static org.apache.hadoop.hdds.conf.ConfigTag.SCM;
+
+/**
+ * Config for SCM Block Client.
+ */
+@ConfigGroup(prefix = "ozone.scmclient")
+public class SCMClientConfig {
+  public static final String SCM_CLIENT_RPC_TIME_OUT = "rpc.timeout";
+  public static final String SCM_CLIENT_FAILOVER_MAX_RETRY =
+      "failover.max.retry";
+  public static final String SCM_CLIENT_RETRY_INTERVAL =
+      "failover.retry.interval";
+
+  @Config(key = SCM_CLIENT_RPC_TIME_OUT,
+      defaultValue = "15m",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +
+          "If ipc.client.ping is set to true and this rpc-timeout " +
+          "is greater than the value of ipc.ping.interval, the effective " +
+          "value of the rpc-timeout is rounded up to multiple of " +
+          "ipc.ping.interval."
+  )
+  private long rpcTimeOut = 15 * 60 * 1000;
+
+  @Config(key = SCM_CLIENT_FAILOVER_MAX_RETRY,
+      defaultValue = "15",
+      type = ConfigType.INT,
+      tags = {OZONE, SCM, CLIENT},
+      description = "Max retry count for SCM Client when failover happens."
+  )
+  private int retryCount = 15;
+
+  @Config(key = SCM_CLIENT_RETRY_INTERVAL,
+      defaultValue = "2s",
+      type = ConfigType.TIME,
+      tags = {OZONE, SCM, CLIENT},
+      timeUnit = TimeUnit.MILLISECONDS,
+      description = "RpcClient timeout on waiting for the response from " +
+          "OzoneManager. The default value is set to 15 minutes. " +

Review comment:
       Also the description does not match with the configuration key and needs to be updated. 

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMProxyInfo.java
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdds.scm.proxy;
+
+import com.google.common.base.Preconditions;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.net.InetSocketAddress;
+
+/**
+ * Class to store SCM proxy info.
+ */
+public class SCMProxyInfo {
+  private String serviceId;
+  private String nodeId;
+  private String rpcAddrStr;
+  private InetSocketAddress rpcAddr;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(SCMProxyInfo.class);
+
+  public SCMProxyInfo(String serviceID, String nodeID,
+                      InetSocketAddress rpcAddress) {
+    Preconditions.checkNotNull(rpcAddress);
+    this.serviceId = serviceID;
+    this.nodeId = nodeID;
+    this.rpcAddrStr = rpcAddress.toString();
+    this.rpcAddr = rpcAddress;
+    if (rpcAddr.isUnresolved()) {
+      LOG.warn("SCM address {} for serviceID {} remains unresolved " +
+              "for node ID {} Check your ozone-site.xml file to ensure scm " +
+              "addresses are configured properly.",
+          rpcAddress, serviceId, nodeId);
+    }
+  }
+
+  public String toString() {
+    return new StringBuilder()
+        .append("nodeId=")
+        .append(nodeId)
+        .append(",nodeAddress=")
+        .append(rpcAddrStr).toString();
+  }
+
+  public InetSocketAddress getAddress() {

Review comment:
       missing getter for nodeid and serviceid?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/ScmBlockLocationProtocolServerSideTranslatorPB.java
##########
@@ -94,9 +95,26 @@ public ScmBlockLocationProtocolServerSideTranslatorPB(
         .setTraceID(traceID);
   }
 
+  private boolean isLeader() throws ServiceException {
+    if (!(impl instanceof SCMBlockProtocolServer)) {
+      throw new ServiceException("Should be SCMBlockProtocolServer");
+    } else {
+      return ((SCMBlockProtocolServer) impl).getScm().checkLeader();
+    }
+  }
+
   @Override
   public SCMBlockLocationResponse send(RpcController controller,
       SCMBlockLocationRequest request) throws ServiceException {
+    if (!isLeader()) {
+      SCMBlockLocationResponse.Builder response = createSCMBlockResponse(
+          request.getCmdType(),
+          request.getTraceID());
+      response.setSuccess(false);
+      response.setStatus(Status.SCM_NOT_LEADER);
+      response.setLeaderSCMNodeId(null);

Review comment:
       Should we send back the leader known by this SCM follower instead of null to avoid unnecessary client retry?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] amaliujia commented on a change in pull request #1340: HDDS-3188 Enable SCM group with failover proxy for SCM block location.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #1340:
URL: https://github.com/apache/hadoop-ozone/pull/1340#discussion_r480712834



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/ScmBlockLocationProtocolServerSideTranslatorPB.java
##########
@@ -94,9 +95,33 @@ public ScmBlockLocationProtocolServerSideTranslatorPB(
         .setTraceID(traceID);
   }
 
+  private boolean isLeader() throws ServiceException {
+    if (!(impl instanceof SCMBlockProtocolServer)) {
+      throw new ServiceException("Should be SCMBlockProtocolServer");

Review comment:
       nit: might be useful to print which class current `impl` is.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org