You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sun Xin (Jira)" <ji...@apache.org> on 2022/09/01 11:18:00 UTC

[jira] [Created] (HBASE-27354) EOF thrown by WALEntryStream causes replication blocking

Sun Xin created HBASE-27354:
-------------------------------

             Summary: EOF thrown by WALEntryStream causes replication blocking
                 Key: HBASE-27354
                 URL: https://issues.apache.org/jira/browse/HBASE-27354
             Project: HBase
          Issue Type: Bug
          Components: Replication
    Affects Versions: 2.4.14, 3.0.0-alpha-3, 2.5.0, 2.6.0
            Reporter: Sun Xin
            Assignee: Sun Xin


In [WALEntryStream#readNextEntryAndRecordReaderPosition|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L257], it is possible that we read uncommitted data.  If we read beyond the committed file length, then reopen the 

inputStream and seek back.

In our use, we found that the position where seek back may be exactly the length of the file  being written, which may cause EOF.

The thrown EOF is finally caught [ReplicationSourceWALReader.run|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L158], but [totalBufferUsed|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L78] is not cleanup up.

After a long run, all peers will go slow and eventually block completely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)