You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2021/07/09 15:16:20 UTC

[GitHub] [hbase] bharathv commented on a change in pull request #3467: [branch-1] HBASE-26075 Replication is stuck due to zero length wal file in oldWA…

bharathv commented on a change in pull request #3467:
URL: https://github.com/apache/hbase/pull/3467#discussion_r667026199



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestWALEntryStream.java
##########
@@ -917,4 +918,52 @@ public void testCleanClosedWALs() throws Exception {
       assertEquals(0, logQueue.getMetrics().getUncleanlyClosedWALs());
     }
   }
+
+  /**
+   * Tests that we handle EOFException properly if the wal has moved to oldWALs directory.
+   * @throws Exception exception
+   */
+  @Test
+  public void testEOFExceptionInOldWALsDirectory() throws Exception {
+    assertEquals(1, logQueue.getQueueSize(fakeWalGroupId));
+    FSHLog fsLog = (FSHLog)log;
+    Path emptyLogFile = fsLog.getCurrentFileName();
+    log.rollWriter(true);
+    // There will 2 logs in the queue.
+    assertEquals(2, logQueue.getQueueSize(fakeWalGroupId));
+
+    Configuration localConf = new Configuration(conf);
+    localConf.setInt("replication.source.maxretriesmultiplier", 1);
+    localConf.setBoolean("replication.source.eof.autorecovery", true);
+
+    try (WALEntryStream entryStream =
+      new WALEntryStream(logQueue, fs, localConf, logQueue.getMetrics(), fakeWalGroupId)) {
+      // Get the archived dir path for the first wal.
+      Path archivePath = entryStream.getArchivedLog(emptyLogFile);
+      // Make sure that the wal path is not the same as archived Dir path.
+      assertTrue(!emptyLogFile.toString().equals(archivePath.toString()));

Review comment:
       assertEquals() is better (will include the log with failures)

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReaderThread.java
##########
@@ -309,9 +309,14 @@ private boolean handleEofException(Exception e, WALEntryStream entryStream,
     if (e.getCause() instanceof EOFException && (isRecoveredSource || queue.size() > 1)
         && conf.getBoolean("replication.source.eof.autorecovery", false)) {
       try {
-        if (fs.getFileStatus(queue.peek()).getLen() == 0) {
-          LOG.warn("Forcing removal of 0 length log in queue: " + queue.peek());
-          lastReadPath = queue.peek();
+        Path path = queue.peek();
+        if (!fs.exists(path)) {
+          // There is a chance that wal has moved to oldWALs directory, so look there also.
+          path = entryStream.getArchivedLog(path);
+        }
+        if (fs.getFileStatus(path).getLen() == 0) {
+          LOG.warn("Forcing removal of 0 length log in queue: " + path);
+          lastReadPath = path;

Review comment:
       Not able to comment in L333, but that should not be queue.peek()? Also mind including the ioe stack in LOG.warn() while you are there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org