You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/07/29 17:37:57 UTC

[GitHub] [hbase] anoopsjohn commented on a change in pull request #2168: HBASE-24695 FSHLog - close the current WAL file in a background thread.

anoopsjohn commented on a change in pull request #2168:
URL: https://github.com/apache/hbase/pull/2168#discussion_r462215252



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
##########
@@ -412,6 +422,24 @@ protected void doReplaceWriter(Path oldPath, Path newPath, Writer nextWriter) th
     }
   }
 
+  private void closeWriter(Path path, boolean syncCloseCall) throws IOException {
+    try {
+      TraceUtil.addTimelineAnnotation("closing writer");
+      writer.close();
+      TraceUtil.addTimelineAnnotation("writer closed");
+    } catch (IOException ioe) {
+      int errors = closeErrorCount.incrementAndGet();
+      boolean hasUnflushedEntries = isUnflushedEntries();
+      if (syncCloseCall && (hasUnflushedEntries || (errors > this.closeErrorsTolerated))) {

Review comment:
       Ideally when we writer close, there wont be any unflushed entries. We have synced all writes happend to this wal writer.  But as the code was having it and to be on safe side, continue to use that.  Ya in that case will go with sync way of call so that we can act against and IOE in close call.  Same with that errors count. If consecutively those many failures, we will have to throw IOE and RS will abort.  But here one catch is there.  This #errors based abort wont be that strict from now on.  Say if the errors tolerated is 1, we need to throw back IOE if errors count is 2.  But it is possible that one close req came and given to a thread and before that is done another close req and given to another thread and then a 3rd close req came by then 1st thread is done and it was failed. So the 3rd req will be tried in sync way of close. By the time it finishes, the 2nd close thread also failed to close. So the errors already became 2 and 3d  (sync way call) also failed which makes t
 he count to 3.  So instead of throw back IOE at errors=2, it will do now on errors=3.  But we are sure all edits are synced and no data loss. So I believe this is ok.. Asycn wal is not having any such logic at all.

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
##########
@@ -436,6 +464,19 @@ protected void doShutdown() throws IOException {
       this.writer.close();
       this.writer = null;
     }
+    closeExecutor.shutdown();
+    try {
+      if (!closeExecutor.awaitTermination(waitOnShutdownInSeconds, TimeUnit.SECONDS)) {
+        LOG.error("We have waited " + waitOnShutdownInSeconds + " seconds but"

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org