You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ratis.apache.org by GitBox <gi...@apache.org> on 2020/08/27 07:36:15 UTC

[GitHub] [incubator-ratis] lokeshj1703 opened a new pull request #185: Watch for commit calls are blocked for a long if no other message

lokeshj1703 opened a new pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185


   Co-authored-by: Elek Márton <el...@apache.org>
   
   ## What changes were proposed in this pull request?
   
   The PR initiates heartbeat if follower's commitIndex lags from leader's commitIndex. This helps in faster resolution of any pending watch requests in the leader. 
   
   It also changes the waitTimes for the appender thread so that the thread does not sleep for a long time while there are pending watch requests in leader.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/RATIS-1042
   
   ## How was this patch tested?
   
   Existing UT and by running teragen for Ozone


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on a change in pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on a change in pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#discussion_r481856045



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java
##########
@@ -221,11 +221,12 @@ private TermIndex getPrevious(long nextIndex) {
     return null;
   }
 
-  protected AppendEntriesRequestProto createRequest(long callId) throws RaftLogIOException {
+  protected AppendEntriesRequestProto createRequest(long callId,
+      boolean heartbeat) throws RaftLogIOException {

Review comment:
       Can we rename `heartbeat ` to `excludeLogEntries `?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia edited a comment on pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#issuecomment-685991632


   I am wondering whether this change is related to https://issues.apache.org/jira/browse/RATIS-624.
   
   In RATIS-624, what I am trying to do is 
   1) add pause()/resume() to RatisServerImpl 
   2) design something to detect whether a follower is lagging too far.
   3) design a way to allow stop the lagging follower, install a snapshot to make the follower catch up.
   
   
   This PR seems to be relevant to 2) by comparing commitIndex lags and allow follower initialize a faster heartbeat?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] elek commented on a change in pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#discussion_r482868452



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -101,41 +101,44 @@ private synchronized void resetClient(AppendEntriesRequest request) {
 
   @Override
   protected void runAppenderImpl() throws IOException {
-    boolean shouldAppendLog;
+    boolean installSnapshotRequired;

Review comment:
       Because that code is not used, I guess.
   
   But it's a good question. I think `LogAppender#runAppenderImpl` either should be empty and abstract or implementation of `GrpcLogAppender#runAppanderImpl` should be moved to there (if possible).  (A different, refactoring PR, maybe?)
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] elek commented on pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#issuecomment-686394507


   > This PR seems to be relevant to 2) by comparing commitIndex lags and allow follower initialize a faster heartbeat?
   
   Can be relevant, but based on my measurement we didn't have a real lag. Usually the follower didn't inform the leader about the last comitted index. The datanode was paused for 2.5 seconds, but the lag between the follower and leader was not significant (1-3 indexes) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] lokeshj1703 commented on a change in pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#discussion_r484376756



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -101,41 +101,44 @@ private synchronized void resetClient(AppendEntriesRequest request) {
 
   @Override
   protected void runAppenderImpl() throws IOException {
-    boolean shouldAppendLog;
+    boolean installSnapshotRequired;

Review comment:
       I agree, it would be a good idea to do it in a separate PR. There are many changes in GrpcLogAppender which would be good to have in LogAppender.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#issuecomment-697144507


   +1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia commented on pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
amaliujia commented on pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#issuecomment-685991632


   I am wondering whether this change is related to https://issues.apache.org/jira/browse/RATIS-624.
   
   In RATIS-624, what I am trying to do is 
   1) add pause()/resume() to RatisServerImpl 
   2) design something to detect whether a follower is lagging too far.
   3) design a way to allow stop the lagging follower, install a snapshot to make the follower catch up.
   
   
   This PR seems to be relevant to 2)? Sort of?   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] lokeshj1703 commented on a change in pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#discussion_r484376861



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/LogAppender.java
##########
@@ -221,11 +221,12 @@ private TermIndex getPrevious(long nextIndex) {
     return null;
   }
 
-  protected AppendEntriesRequestProto createRequest(long callId) throws RaftLogIOException {
+  protected AppendEntriesRequestProto createRequest(long callId,
+      boolean heartbeat) throws RaftLogIOException {

Review comment:
       addressed it in the latest commit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#issuecomment-697146441


   @lokeshj1703 @elek Thanks the great improvement, @bshashikant @bharatviswa504 @amaliujia Thanks for review. I merged it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] elek commented on a change in pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#discussion_r482866266



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -101,41 +101,44 @@ private synchronized void resetClient(AppendEntriesRequest request) {
 
   @Override
   protected void runAppenderImpl() throws IOException {
-    boolean shouldAppendLog;
+    boolean installSnapshotRequired;
     for(; isAppenderRunning(); mayWait()) {
-      shouldAppendLog = true;
-      if (shouldSendRequest()) {
+      installSnapshotRequired = false;
+
+      //HB period is expired OR we have messages OR follower is behind with commit index
+      if (haveLogEntriesToSendOut() || heartbeatTimeout() || isFollowerCommitBehindLastCommitIndex()) {
+
         if (installSnapshotEnabled) {
           SnapshotInfo snapshot = shouldInstallSnapshot();
           if (snapshot != null) {
             installSnapshot(snapshot);
-            shouldAppendLog = false;
+            installSnapshotRequired = true;
           }
         } else {
           TermIndex installSnapshotNotificationTermIndex = shouldNotifyToInstallSnapshot();
           if (installSnapshotNotificationTermIndex != null) {
             installSnapshot(installSnapshotNotificationTermIndex);
-            shouldAppendLog = false;
+            installSnapshotRequired = true;
           }
         }
-        if (shouldHeartbeat() || (shouldAppendLog && !shouldWait())) {
-          // keep appending log entries or sending heartbeats
-          appendLog();
-        }
+
+        appendLog(installSnapshotRequired || haveTooManyPendingRequests());
+
       }
       checkSlowness();
+
     }
 
     Optional.ofNullable(appendLogRequestObserver).ifPresent(StreamObserver::onCompleted);
   }
 
   private long getWaitTimeMs() {
-    if (!shouldSendRequest()) {
-      return getHeartbeatRemainingTime(); // No requests, wait until heartbeat
-    } else if (shouldWait()) {
-      return getHalfMinTimeoutMs(); // Should wait for a short time
+    if (haveTooManyPendingRequests()) {
+      return getHeartbeatRemainingTime(); // Should wait for a short time
+    } else if (haveLogEntriesToSendOut() || heartbeatTimeout()) {
+      return 0L;
     }
-    return 0L;
+    return Math.min(10L, getHeartbeatRemainingTime());

Review comment:
       `getHeartbeatRemainingTime()` can be a longer time, for example in Ozone it's 2.5 seconds. With using 10L (or less) the thread itself will check the other conditions in the `runAppenderImpl` loop. 
   
   As there is no easy way to notify the waiting thread in case of watch for commit requests, the thread is waking up at every 10ms and checks if the HB should be sent out or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia edited a comment on pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
amaliujia edited a comment on pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#issuecomment-685991632


   I am wondering whether this change is related to https://issues.apache.org/jira/browse/RATIS-624.
   
   In RATIS-624, what I am trying to do is 
   1) add pause()/resume() to RatisServerImpl 
   2) design something to detect whether a follower is lagging too far.
   3) design a way to allow pause the lagging follower, install a snapshot to make the follower catch up, then resume the follower.
   
   
   This PR seems to be relevant to 2) by comparing commitIndex lags and allow follower initialize a faster heartbeat?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] lokeshj1703 commented on a change in pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#discussion_r484376055



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -101,41 +101,44 @@ private synchronized void resetClient(AppendEntriesRequest request) {
 
   @Override
   protected void runAppenderImpl() throws IOException {
-    boolean shouldAppendLog;
+    boolean installSnapshotRequired;
     for(; isAppenderRunning(); mayWait()) {
-      shouldAppendLog = true;
-      if (shouldSendRequest()) {
+      installSnapshotRequired = false;
+
+      //HB period is expired OR we have messages OR follower is behind with commit index
+      if (haveLogEntriesToSendOut() || heartbeatTimeout() || isFollowerCommitBehindLastCommitIndex()) {
+
         if (installSnapshotEnabled) {
           SnapshotInfo snapshot = shouldInstallSnapshot();
           if (snapshot != null) {
             installSnapshot(snapshot);
-            shouldAppendLog = false;
+            installSnapshotRequired = true;
           }
         } else {
           TermIndex installSnapshotNotificationTermIndex = shouldNotifyToInstallSnapshot();
           if (installSnapshotNotificationTermIndex != null) {
             installSnapshot(installSnapshotNotificationTermIndex);
-            shouldAppendLog = false;
+            installSnapshotRequired = true;
           }
         }
-        if (shouldHeartbeat() || (shouldAppendLog && !shouldWait())) {
-          // keep appending log entries or sending heartbeats
-          appendLog();
-        }
+
+        appendLog(installSnapshotRequired || haveTooManyPendingRequests());
+
       }
       checkSlowness();
+
     }
 
     Optional.ofNullable(appendLogRequestObserver).ifPresent(StreamObserver::onCompleted);
   }
 
   private long getWaitTimeMs() {
-    if (!shouldSendRequest()) {
-      return getHeartbeatRemainingTime(); // No requests, wait until heartbeat
-    } else if (shouldWait()) {
-      return getHalfMinTimeoutMs(); // Should wait for a short time
+    if (haveTooManyPendingRequests()) {
+      return getHeartbeatRemainingTime(); // Should wait for a short time
+    } else if (haveLogEntriesToSendOut() || heartbeatTimeout()) {
+      return 0L;
     }
-    return 0L;
+    return Math.min(10L, getHeartbeatRemainingTime());

Review comment:
       I have added another commit where leader notifies appenders if there is a commit index change. It would still be good idea to cap the sleep time for appender to 10ms.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] bharatviswa504 commented on a change in pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#discussion_r479548926



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -101,41 +101,44 @@ private synchronized void resetClient(AppendEntriesRequest request) {
 
   @Override
   protected void runAppenderImpl() throws IOException {
-    boolean shouldAppendLog;
+    boolean installSnapshotRequired;

Review comment:
       Question: Why similar code changes are not done in LogAppender#runAppenderImpl

##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -101,41 +101,44 @@ private synchronized void resetClient(AppendEntriesRequest request) {
 
   @Override
   protected void runAppenderImpl() throws IOException {
-    boolean shouldAppendLog;
+    boolean installSnapshotRequired;
     for(; isAppenderRunning(); mayWait()) {
-      shouldAppendLog = true;
-      if (shouldSendRequest()) {
+      installSnapshotRequired = false;
+
+      //HB period is expired OR we have messages OR follower is behind with commit index
+      if (haveLogEntriesToSendOut() || heartbeatTimeout() || isFollowerCommitBehindLastCommitIndex()) {
+
         if (installSnapshotEnabled) {
           SnapshotInfo snapshot = shouldInstallSnapshot();
           if (snapshot != null) {
             installSnapshot(snapshot);
-            shouldAppendLog = false;
+            installSnapshotRequired = true;
           }
         } else {
           TermIndex installSnapshotNotificationTermIndex = shouldNotifyToInstallSnapshot();
           if (installSnapshotNotificationTermIndex != null) {
             installSnapshot(installSnapshotNotificationTermIndex);
-            shouldAppendLog = false;
+            installSnapshotRequired = true;
           }
         }
-        if (shouldHeartbeat() || (shouldAppendLog && !shouldWait())) {
-          // keep appending log entries or sending heartbeats
-          appendLog();
-        }
+
+        appendLog(installSnapshotRequired || haveTooManyPendingRequests());
+
       }
       checkSlowness();
+
     }
 
     Optional.ofNullable(appendLogRequestObserver).ifPresent(StreamObserver::onCompleted);
   }
 
   private long getWaitTimeMs() {
-    if (!shouldSendRequest()) {
-      return getHeartbeatRemainingTime(); // No requests, wait until heartbeat
-    } else if (shouldWait()) {
-      return getHalfMinTimeoutMs(); // Should wait for a short time
+    if (haveTooManyPendingRequests()) {
+      return getHeartbeatRemainingTime(); // Should wait for a short time
+    } else if (haveLogEntriesToSendOut() || heartbeatTimeout()) {
+      return 0L;
     }
-    return 0L;
+    return Math.min(10L, getHeartbeatRemainingTime());

Review comment:
       Any reason for using 10L with Min, and not returning directly getHeartbeatRemainingTime()

##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -101,41 +101,44 @@ private synchronized void resetClient(AppendEntriesRequest request) {
 
   @Override
   protected void runAppenderImpl() throws IOException {
-    boolean shouldAppendLog;
+    boolean installSnapshotRequired;
     for(; isAppenderRunning(); mayWait()) {
-      shouldAppendLog = true;
-      if (shouldSendRequest()) {
+      installSnapshotRequired = false;
+
+      //HB period is expired OR we have messages OR follower is behind with commit index
+      if (haveLogEntriesToSendOut() || heartbeatTimeout() || isFollowerCommitBehindLastCommitIndex()) {
+
         if (installSnapshotEnabled) {
           SnapshotInfo snapshot = shouldInstallSnapshot();
           if (snapshot != null) {
             installSnapshot(snapshot);
-            shouldAppendLog = false;
+            installSnapshotRequired = true;
           }
         } else {
           TermIndex installSnapshotNotificationTermIndex = shouldNotifyToInstallSnapshot();
           if (installSnapshotNotificationTermIndex != null) {
             installSnapshot(installSnapshotNotificationTermIndex);
-            shouldAppendLog = false;
+            installSnapshotRequired = true;
           }
         }
-        if (shouldHeartbeat() || (shouldAppendLog && !shouldWait())) {
-          // keep appending log entries or sending heartbeats
-          appendLog();
-        }
+
+        appendLog(installSnapshotRequired || haveTooManyPendingRequests());
+
       }
       checkSlowness();
+
     }
 
     Optional.ofNullable(appendLogRequestObserver).ifPresent(StreamObserver::onCompleted);
   }
 
   private long getWaitTimeMs() {
-    if (!shouldSendRequest()) {
-      return getHeartbeatRemainingTime(); // No requests, wait until heartbeat
-    } else if (shouldWait()) {
-      return getHalfMinTimeoutMs(); // Should wait for a short time
+    if (haveTooManyPendingRequests()) {
+      return getHeartbeatRemainingTime(); // Should wait for a short time
+    } else if (haveLogEntriesToSendOut() || heartbeatTimeout()) {
+      return 0L;
     }
-    return 0L;
+    return Math.min(10L, getHeartbeatRemainingTime());

Review comment:
       And also in Jira it is mentioned 
   " that the thread does not sleep for a long time while there are pending watch requests in leader"
   
   We have checked if followerIndex < raftLog.getIndex() where we have checked pending watch requests.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on a change in pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on a change in pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185#discussion_r481853565



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -166,24 +169,26 @@ protected boolean shouldSendRequest() {
     return appendLogRequestObserver == null || super.shouldSendRequest();
   }
 
-  /** @return true iff not received first response or queue is full. */
-  private boolean shouldWait() {
+  /**
+   * @return true iff not received first response or queue is full.
+   */
+  private boolean haveTooManyPendingRequests() {
     final int size = pendingRequests.logRequestsSize();
     if (size == 0) {
       return false;
     }
     return !firstResponseReceived || size >= maxPendingRequestsNum;
   }
 
-  private void appendLog() throws IOException {
+  private void appendLog(boolean excludeLogEntries) throws IOException {

Review comment:
       Can we rename excludeLogEntries to heartbeat ? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang merged pull request #185: RATIS-1042. Watch for commit calls are blocked for a long if no other message

Posted by GitBox <gi...@apache.org>.
runzhiwang merged pull request #185:
URL: https://github.com/apache/incubator-ratis/pull/185


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org