You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sun Xin (Jira)" <ji...@apache.org> on 2022/09/01 11:18:00 UTC
[jira] [Created] (HBASE-27354) EOF thrown by WALEntryStream causes replication blocking
Sun Xin created HBASE-27354:
-------------------------------
Summary: EOF thrown by WALEntryStream causes replication blocking
Key: HBASE-27354
URL: https://issues.apache.org/jira/browse/HBASE-27354
Project: HBase
Issue Type: Bug
Components: Replication
Affects Versions: 2.4.14, 3.0.0-alpha-3, 2.5.0, 2.6.0
Reporter: Sun Xin
Assignee: Sun Xin
In [WALEntryStream#readNextEntryAndRecordReaderPosition|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L257], it is possible that we read uncommitted data. If we read beyond the committed file length, then reopen the
inputStream and seek back.
In our use, we found that the position where seek back may be exactly the length of the file being written, which may cause EOF.
The thrown EOF is finally caught [ReplicationSourceWALReader.run|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L158], but [totalBufferUsed|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L78] is not cleanup up.
After a long run, all peers will go slow and eventually block completely.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)