You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/04/14 05:09:22 UTC

[GitHub] [ozone] bshashikant opened a new pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

bshashikant opened a new pull request #2155:
URL: https://github.com/apache/ozone/pull/2155


   ## What changes were proposed in this pull request?
   Fixed install snapshot behaviour
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-5103
   
   ## How was this patch tested?
   
   Enabled Unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r616396459



##########
File path: pom.xml
##########
@@ -79,7 +79,7 @@ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xs
     <declared.ozone.version>${ozone.version}</declared.ozone.version>
 
     <!-- Apache Ratis version -->
-    <ratis.version>2.0.0</ratis.version>
+    <ratis.version>2.1.0-80f6e4b-SNAPSHOT</ratis.version>

Review comment:
       Thanks @bshashikant and @bharatviswa504 for the effect in RATIS-1326 and RATIS-1356.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r616422608



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMSnapshotProvider.java
##########
@@ -103,18 +101,19 @@ public DBCheckpoint getSCMDBSnapshot(String leaderSCMNodeID)
             .getAbsolutePath();
     File targetFile = new File(snapshotFilePath + ".tar.gz");
 
-    // the client instance will be initialized only when first install snapshot
-    // notification from ratis leader will be received.
-    if (client == null) {
-      client = new InterSCMGrpcClient(
-          peerNodesMap.get(leaderSCMNodeID).getInetAddress().getHostAddress(),
-          conf);
-    }
+    // the downloadClient instance will be craeted as and when install snapshot
+    // request is received. No caching of the client as it should be a very rare
+    int port = peerNodesMap.get(leaderSCMNodeID).getGrpcPort();
+    SCMSnapshotDownloader downloadClient = new InterSCMGrpcClient(
+        peerNodesMap.get(leaderSCMNodeID).getInetAddress().getHostAddress(),
+        port, conf);
     try {
-      client.download(targetFile.toPath()).get();
-    } catch (InterruptedException | ExecutionException e) {
+      downloadClient.download(targetFile.toPath()).get();

Review comment:
       Use try with resource here ?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/InterSCMGrpcClient.java
##########
@@ -52,13 +52,15 @@
   private final InterSCMProtocolServiceGrpc.InterSCMProtocolServiceStub
       client;
 
-  private final long timeout;
-
-  public InterSCMGrpcClient(final String host, final ConfigurationSource conf) {
-    Preconditions.checkNotNull(conf);
-    int port = conf.getInt(ScmConfigKeys.OZONE_SCM_GRPC_PORT_KEY,
-        ScmConfigKeys.OZONE_SCM_GRPC_PORT_DEFAULT);
-    timeout =
+  public InterSCMGrpcClient(final String host, final int leaderPort,
+      final ConfigurationSource conf) {
+    // if the leader grpc port details are not setup in the peer Map,
+    // fall back to default grpc port.
+    final int port = leaderPort == 0 ?

Review comment:
       The port here follows the HA style config, since it is in `OZONE_SCM_GRPC_PORT_KEY` of `nodeSpecificConfigKeys`, thus it can be handled in MiniOzoneCluster, right ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r616417617



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/InterSCMGrpcClient.java
##########
@@ -52,13 +52,15 @@
   private final InterSCMProtocolServiceGrpc.InterSCMProtocolServiceStub
       client;
 
-  private final long timeout;
-
-  public InterSCMGrpcClient(final String host, final ConfigurationSource conf) {
-    Preconditions.checkNotNull(conf);
-    int port = conf.getInt(ScmConfigKeys.OZONE_SCM_GRPC_PORT_KEY,
-        ScmConfigKeys.OZONE_SCM_GRPC_PORT_DEFAULT);
-    timeout =
+  public InterSCMGrpcClient(final String host, final int leaderPort,
+      final ConfigurationSource conf) {
+    // if the leader grpc port details are not setup in the peer Map,
+    // fall back to default grpc port.
+    final int port = leaderPort == 0 ?

Review comment:
       This is required for tests. With Multiple SCM setups, grpc ports for individual SCMs will be different. But, grpc client needs to connect leader grpc port which it does not know from the leaderInfo from Ratis. In actual production clusters, all grpc servers on all nodes will bind to the same grpc port which is not possible in MiniOzoneHAcluster.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r616367260



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMSnapshotProvider.java
##########
@@ -103,18 +101,19 @@ public DBCheckpoint getSCMDBSnapshot(String leaderSCMNodeID)
             .getAbsolutePath();
     File targetFile = new File(snapshotFilePath + ".tar.gz");
 
-    // the client instance will be initialized only when first install snapshot
-    // notification from ratis leader will be received.
-    if (client == null) {
-      client = new InterSCMGrpcClient(
-          peerNodesMap.get(leaderSCMNodeID).getInetAddress().getHostAddress(),
-          conf);
-    }
+    // the downloadClient instance will be craeted as and when install snapshot

Review comment:
       Minor: craeted -> created

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/InterSCMGrpcClient.java
##########
@@ -52,13 +52,15 @@
   private final InterSCMProtocolServiceGrpc.InterSCMProtocolServiceStub
       client;
 
-  private final long timeout;
-
-  public InterSCMGrpcClient(final String host, final ConfigurationSource conf) {
-    Preconditions.checkNotNull(conf);
-    int port = conf.getInt(ScmConfigKeys.OZONE_SCM_GRPC_PORT_KEY,
-        ScmConfigKeys.OZONE_SCM_GRPC_PORT_DEFAULT);
-    timeout =
+  public InterSCMGrpcClient(final String host, final int leaderPort,
+      final ConfigurationSource conf) {
+    // if the leader grpc port details are not setup in the peer Map,
+    // fall back to default grpc port.
+    final int port = leaderPort == 0 ?

Review comment:
       Do we need this, because grpcPort is set in SCMNodeDetails during build?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -184,11 +186,20 @@ public void notifyNotLeader(Collection<TransactionContext> pendingEntries) {
   @Override
   public CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
       RaftProtos.RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {
-
-    String leaderNodeId = RaftPeerId.valueOf(roleInfoProto.getFollowerInfo()
-        .getLeaderInfo().getId().getId()).toString();
+    if (!roleInfoProto.getFollowerInfo().hasLeaderInfo()) {
+      return JavaUtils.completeExceptionally(new IOException("Failed " +
+          "notifyInstallSnapshotFromLeader due to missing leader info"));
+    }
+    String leaderAddress = roleInfoProto.getFollowerInfo()
+        .getLeaderInfo().getId().getAddress();
+    Optional<SCMNodeDetails> leaderDetails =
+        scm.getSCMHANodeDetails().getPeerNodeDetails().stream().filter(
+            p -> p.getRatisHostPortStr().equals(leaderAddress))
+            .findFirst();
+    Preconditions.checkNotNull(leaderDetails);

Review comment:
       Minor: Here we need to check for checkState(leaderDetails.isPresent())
   As findFirst returns Optional.empty for empty stream.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r612945645



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAManagerImpl.java
##########
@@ -101,14 +101,19 @@ public void start() throws IOException {
     if (ratisServer.getDivision().getGroup().getPeers().isEmpty()) {
       // this is a bootstrapped node
       // It will first try to add itself to existing ring
-      boolean success = HAUtils.addSCM(OzoneConfiguration.of(conf),
+      final SCMNodeDetails details =
+          scm.getSCMHANodeDetails().getLocalNodeDetails();
+      final boolean success = HAUtils.addSCM(OzoneConfiguration.of(conf),
           new AddSCMRequest.Builder().setClusterId(scm.getClusterId())
               .setScmId(scm.getScmId())
               .setRatisAddr(scm.getSCMHANodeDetails().getLocalNodeDetails()

Review comment:
       Minor Nit:
   We can use details.getRatisHostPortStr() instead of scm.getSCMHANodeDetails().getLocalNodeDetails().getRatisHostPortStr()
   
   And also can we rename details -> nodeDetails

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/InterSCMGrpcClient.java
##########
@@ -52,14 +52,8 @@
   private final InterSCMProtocolServiceGrpc.InterSCMProtocolServiceStub
       client;
 
-  private final long timeout;
-
-  public InterSCMGrpcClient(final String host, final ConfigurationSource conf) {
-    Preconditions.checkNotNull(conf);
-    int port = conf.getInt(ScmConfigKeys.OZONE_SCM_GRPC_PORT_KEY,
-        ScmConfigKeys.OZONE_SCM_GRPC_PORT_DEFAULT);
-    timeout =
-        conf.getObject(SCMHAConfiguration.class).getGrpcDeadlineInterval();
+  public InterSCMGrpcClient(final String host, final int port,
+      final long timeout) {

Review comment:
       Can we retain passing in conf also, as it will be required for HDDS-5060, as it needs to create SecurityConfig from conf and enable sslContext.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -185,10 +186,16 @@ public void notifyNotLeader(Collection<TransactionContext> pendingEntries) {
   public CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
       RaftProtos.RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {
 
-    String leaderNodeId = RaftPeerId.valueOf(roleInfoProto.getFollowerInfo()
-        .getLeaderInfo().getId().getId()).toString();
+    String leaderNode = roleInfoProto.getFollowerInfo()
+        .getLeaderInfo().getId().getAddress();
+    Optional<SCMNodeDetails> leaderDetails =
+        scm.getSCMHANodeDetails().getPeerNodeDetails().stream().filter(
+            p -> p.getRatisHostPortStr().equals(p.getRatisHostPortStr()))

Review comment:
       NIT: Can we rename that to leaderAddress?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -185,10 +186,16 @@ public void notifyNotLeader(Collection<TransactionContext> pendingEntries) {
   public CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
       RaftProtos.RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {
 
-    String leaderNodeId = RaftPeerId.valueOf(roleInfoProto.getFollowerInfo()
-        .getLeaderInfo().getId().getId()).toString();
+    String leaderNode = roleInfoProto.getFollowerInfo()
+        .getLeaderInfo().getId().getAddress();
+    Optional<SCMNodeDetails> leaderDetails =
+        scm.getSCMHANodeDetails().getPeerNodeDetails().stream().filter(
+            p -> p.getRatisHostPortStr().equals(p.getRatisHostPortStr()))

Review comment:
       Should this be here compared it with leaderNode above where we get leaderAddress?
   

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -185,10 +186,16 @@ public void notifyNotLeader(Collection<TransactionContext> pendingEntries) {
   public CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
       RaftProtos.RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {
 
-    String leaderNodeId = RaftPeerId.valueOf(roleInfoProto.getFollowerInfo()
-        .getLeaderInfo().getId().getId()).toString();
+    String leaderNode = roleInfoProto.getFollowerInfo()

Review comment:
       In tests I see leaderNode coming ""
   Received install snapshot notification from SCM leader:  with term index: (t:2, i:107)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r616419544



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/InterSCMGrpcClient.java
##########
@@ -52,13 +52,15 @@
   private final InterSCMProtocolServiceGrpc.InterSCMProtocolServiceStub
       client;
 
-  private final long timeout;
-
-  public InterSCMGrpcClient(final String host, final ConfigurationSource conf) {
-    Preconditions.checkNotNull(conf);
-    int port = conf.getInt(ScmConfigKeys.OZONE_SCM_GRPC_PORT_KEY,
-        ScmConfigKeys.OZONE_SCM_GRPC_PORT_DEFAULT);
-    timeout =
+  public InterSCMGrpcClient(final String host, final int leaderPort,
+      final ConfigurationSource conf) {
+    // if the leader grpc port details are not setup in the peer Map,
+    // fall back to default grpc port.
+    final int port = leaderPort == 0 ?

Review comment:
       If each SCM is having different Grpc port, it can get leader grpc port from nodeDetails Right?
   If we have already set grpc port in tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r615763628



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -185,10 +186,16 @@ public void notifyNotLeader(Collection<TransactionContext> pendingEntries) {
   public CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
       RaftProtos.RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {
 
-    String leaderNodeId = RaftPeerId.valueOf(roleInfoProto.getFollowerInfo()
-        .getLeaderInfo().getId().getId()).toString();
+    String leaderNode = roleInfoProto.getFollowerInfo()

Review comment:
       Addressed with latest ratis version bump




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#issuecomment-826735040


   > Hi, this patch modified Ozone to use a -SNAPSHOT dependency. Can we avoid using it? Is it absolutely necessary?
   > 
   > I would prefer to depend on released Ratis artifacts only as our last release was delayed with 4 months when we were blocked by the Ratis release.
   > 
   > Can we have a ratis 2.0.1 and depend on it?
   
   This is necessary as only the snapshot has the required fixes to make snapshot mechanism work for SCM HA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#issuecomment-825375684


   Thanks @bharatviswa504 / @GlenGeng for the review.  Committing this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant merged pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant merged pull request #2155:
URL: https://github.com/apache/ozone/pull/2155


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r616419898



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -184,11 +186,20 @@ public void notifyNotLeader(Collection<TransactionContext> pendingEntries) {
   @Override
   public CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
       RaftProtos.RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {
-
-    String leaderNodeId = RaftPeerId.valueOf(roleInfoProto.getFollowerInfo()
-        .getLeaderInfo().getId().getId()).toString();
+    if (!roleInfoProto.getFollowerInfo().hasLeaderInfo()) {
+      return JavaUtils.completeExceptionally(new IOException("Failed " +
+          "notifyInstallSnapshotFromLeader due to missing leader info"));
+    }
+    String leaderAddress = roleInfoProto.getFollowerInfo()
+        .getLeaderInfo().getId().getAddress();
+    Optional<SCMNodeDetails> leaderDetails =
+        scm.getSCMHANodeDetails().getPeerNodeDetails().stream().filter(
+            p -> p.getRatisHostPortStr().equals(leaderAddress))
+            .findFirst();
+    Preconditions.checkNotNull(leaderDetails);

Review comment:
       Done

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -184,11 +186,20 @@ public void notifyNotLeader(Collection<TransactionContext> pendingEntries) {
   @Override
   public CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
       RaftProtos.RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {
-
-    String leaderNodeId = RaftPeerId.valueOf(roleInfoProto.getFollowerInfo()
-        .getLeaderInfo().getId().getId()).toString();
+    if (!roleInfoProto.getFollowerInfo().hasLeaderInfo()) {
+      return JavaUtils.completeExceptionally(new IOException("Failed " +

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r616425960



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/InterSCMGrpcClient.java
##########
@@ -52,13 +52,15 @@
   private final InterSCMProtocolServiceGrpc.InterSCMProtocolServiceStub
       client;
 
-  private final long timeout;
-
-  public InterSCMGrpcClient(final String host, final ConfigurationSource conf) {
-    Preconditions.checkNotNull(conf);
-    int port = conf.getInt(ScmConfigKeys.OZONE_SCM_GRPC_PORT_KEY,
-        ScmConfigKeys.OZONE_SCM_GRPC_PORT_DEFAULT);
-    timeout =
+  public InterSCMGrpcClient(final String host, final int leaderPort,
+      final ConfigurationSource conf) {
+    // if the leader grpc port details are not setup in the peer Map,
+    // fall back to default grpc port.
+    final int port = leaderPort == 0 ?

Review comment:
       fixed the tests. Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#issuecomment-819304772


   Thanks @bshashikant. LGTM. 
   BTW, does the ratis version in master contains the fix RATIS-1326 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r618234181



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMSnapshotProvider.java
##########
@@ -103,18 +101,19 @@ public DBCheckpoint getSCMDBSnapshot(String leaderSCMNodeID)
             .getAbsolutePath();
     File targetFile = new File(snapshotFilePath + ".tar.gz");
 
-    // the client instance will be initialized only when first install snapshot
-    // notification from ratis leader will be received.
-    if (client == null) {
-      client = new InterSCMGrpcClient(
-          peerNodesMap.get(leaderSCMNodeID).getInetAddress().getHostAddress(),
-          conf);
-    }
+    // the downloadClient instance will be craeted as and when install snapshot
+    // request is received. No caching of the client as it should be a very rare
+    int port = peerNodesMap.get(leaderSCMNodeID).getGrpcPort();
+    SCMSnapshotDownloader downloadClient = new InterSCMGrpcClient(
+        peerNodesMap.get(leaderSCMNodeID).getInetAddress().getHostAddress(),
+        port, conf);
     try {
-      client.download(targetFile.toPath()).get();
-    } catch (InterruptedException | ExecutionException e) {
+      downloadClient.download(targetFile.toPath()).get();

Review comment:
       For now, SCMSnapshotDownloader does not implement the AutoCloseable interface. Will take it up in refactoring in a subsequent patch. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r616419544



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/InterSCMGrpcClient.java
##########
@@ -52,13 +52,15 @@
   private final InterSCMProtocolServiceGrpc.InterSCMProtocolServiceStub
       client;
 
-  private final long timeout;
-
-  public InterSCMGrpcClient(final String host, final ConfigurationSource conf) {
-    Preconditions.checkNotNull(conf);
-    int port = conf.getInt(ScmConfigKeys.OZONE_SCM_GRPC_PORT_KEY,
-        ScmConfigKeys.OZONE_SCM_GRPC_PORT_DEFAULT);
-    timeout =
+  public InterSCMGrpcClient(final String host, final int leaderPort,
+      final ConfigurationSource conf) {
+    // if the leader grpc port details are not setup in the peer Map,
+    // fall back to default grpc port.
+    final int port = leaderPort == 0 ?

Review comment:
       If each SCM is having different Grpc port, it can get leader grpc port from nodeDetails Right?(It does not need to know this from leader info from ratis, we can get this from nodeDetails. Not sure If I am missing something here.
   If we have already set grpc port in tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#discussion_r616372639



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -184,11 +186,20 @@ public void notifyNotLeader(Collection<TransactionContext> pendingEntries) {
   @Override
   public CompletableFuture<TermIndex> notifyInstallSnapshotFromLeader(
       RaftProtos.RoleInfoProto roleInfoProto, TermIndex firstTermIndexInLog) {
-
-    String leaderNodeId = RaftPeerId.valueOf(roleInfoProto.getFollowerInfo()
-        .getLeaderInfo().getId().getId()).toString();
+    if (!roleInfoProto.getFollowerInfo().hasLeaderInfo()) {
+      return JavaUtils.completeExceptionally(new IOException("Failed " +

Review comment:
       Minor: Failed -> Failed to




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#issuecomment-822214852


   > Thanks @bshashikant. LGTM.
   > BTW, does the ratis version in master contains the fix RATIS-1326 ?
   
   This needs https://issues.apache.org/jira/browse/RATIS-1356 as well. Once this gets resolved, will update the ratis version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] elek commented on pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#issuecomment-826751401


   > This is necessary as only the snapshot has the required fixes to make snapshot mechanism work for SCM HA.
   
   Thanks for the answer. Still I would prefer to depend only on released artifacts from other projects (which is suggested originally by Nanda) I would ask for other's opinion on the community meeting.
   
   Can you please help me with the required commit from Ratis side? Is it apache/ratis#460? If it's just one commit I will start a ratis discussion to release 2.0.1, and we can switch back to a released version...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] elek commented on pull request #2155: HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine.

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #2155:
URL: https://github.com/apache/ozone/pull/2155#issuecomment-826696375


   Hi, this patch modified Ozone to use a -SNAPSHOT dependency. Can we avoid using it? Is it absolutely necessary?
   
   I would prefer to depend on released Ratis artifacts only as our last release was delayed with 4 months when we were blocked by the Ratis release.
   
   Can we have a ratis 2.0.1 and depend on it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org