You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "tianhang tang (Jira)" <ji...@apache.org> on 2022/03/16 07:41:00 UTC

[jira] [Created] (HBASE-26849) NPE caused by WAL Compression and Replication

tianhang tang created HBASE-26849:
-------------------------------------

Summary: NPE caused by WAL Compression and Replication
Key: HBASE-26849
URL: https://issues.apache.org/jira/browse/HBASE-26849
Project: HBase
Issue Type: Bug
Components: Replication, wal
Affects Versions: 2.4.11, 1.7.1
Reporter: tianhang tang
Assignee: tianhang tang
Attachments: image-2022-03-16-14-25-49-276.png, image-2022-03-16-14-30-15-247.png

My cluster uses HBase 1.4.12, opened WAL compression and replication.

I could found replication sizeOfLogQueue backlog, and after some debugs, found that NPE:

!image-2022-03-16-14-25-49-276.png!

The root cause for this problem is:
WALEntryStream#checkAllBytesParsed:

!image-2022-03-16-14-30-15-247.png!

resetReader does not create a new reader, the original CompressionContext and the dict in it will still be retained.
However, at this time, the position is reset to 0, which means that the HLog needs to be read from the beginning, but the cache that has not been cleared is still used, so there will be problems.
Recreate a new reader here, the problem is solved.

I will open a PR later. But, there are some other places in the current code to resetReader or seekOnFs. I guess these codes doesn't take into account the wal compression case at all...

In theory, as long as the file is read again, the LRUCache should also be rolled back, otherwise there will be inconsistent behavior of READ and WRITE links.
But the position can be roll back to any intermediate position at will, but LRUCache can't...

--
This message was sent by Atlassian Jira
(v8.20.1#820001)