You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2021/08/24 17:13:19 UTC

[GitHub] [hbase] gjacoby126 commented on a change in pull request #3611: [branch-1] HBASE-26195 Abort RS if wal sync fails or times out

gjacoby126 commented on a change in pull request #3611:
URL: https://github.com/apache/hbase/pull/3611#discussion_r695033974



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
##########
@@ -8120,27 +8097,29 @@ public Result append(Append mutate, long nonceGroup, long nonce) throws IOExcept
         rowLock = null;
       }
       // sync the transaction log outside the rowlock
+      walSyncSuccess = false;
       if(txid != 0){
         syncOrDefer(txid, durability);
       }
+      walSyncSuccess = true;
       if (rsServices != null && rsServices.getMetrics() != null) {
         rsServices.getMetrics().updateWriteQueryMeter(this.htableDescriptor.
           getTableName());
       }
-      doRollBackMemstore = false;
+    } catch (Throwable t) {
+      // If wal sync fails, then abort the Region server
+      if (!walSyncSuccess) {
+        rsServices.abort("Wal sync failed", t);

Review comment:
       nit: since you're doing very similar catch logic in multiple methods, you could consider extracting the verify-and-abort logic into a private helper method to save some duplication

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
##########
@@ -3776,14 +3776,16 @@ private long doMiniBatchMutation(BatchOperationInProgress<?> batchOp) throws IOE
 
       success = true;
       return addedSize;
+    } catch (Throwable t) {
+      // If wal sync fails, then abort the Region server
+      if (!walSyncSuccess) {
+        rsServices.abort("Wal sync failed", t);

Review comment:
       nit: Would be nice to have a little more description on the error about why a WAL sync failure should cause an abort. "WAL sync failed. Aborting to avoid a mismatch between the memstore, WAL, and any replicated clusters." maybe?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org