You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2021/12/14 19:33:00 UTC

[jira] [Created] (HBASE-26575) StoreHotnessProtector may block Replication

Bryan Beaudreault created HBASE-26575:
-----------------------------------------

Summary: StoreHotnessProtector may block Replication
Key: HBASE-26575
URL: https://issues.apache.org/jira/browse/HBASE-26575
Project: HBase
Issue Type: Bug
Reporter: Bryan Beaudreault

I'm upgrading from hbase1 to hbase2, and I'm still in my QA environment where load is very low. Even still, I've noticed some bad interaction between Replication and the StoreHotnessProtector.

The ReplicationSink collects edits from the WAL and executes them in batches via the normal HTable interface. Despite the name of this property, the max batch sizes are based on "hbase.rpc.rows.warning.threshold" which has a default of 5000.

The StoreHotnessProtector defaults to allowing 10 concurrent writes (of 100 columns or more) to a Store, or 20 concurrent "prepares" of said writes. The Prepare part is what causes issues here. When a batch mutate comes in, the RS first takes a lock on all rows in the batch. This happens in HRegion#lockRowsAndBuildMiniBatch, and the writes are recorded as "preparing" in StoreHotnessProtector before acquiring the lock. This recording basically increments a counter, and throws an exception if that counter goes over 20.

Back in HRegion#lockRowsAndBuildMiniBatch, the exception is caught and recorded in the results for any items that failed. Any items that succeed continue on to write, unless the write is atomic, in which case it immediately throws an exception.

This response gets back to the client, which automatically handles retries. With enough retries, the batch call will eventually succeed because each retry contains fewer and fewer writes to handle. Assuming you have enough retries, this is basically enforcing an automatic chunking of of a batch write into sub-batches of 20. Again, this only affects writes that hit more than 100 columns (by default).

At this point I'll say that this in general seems overly aggressive, especially since the StoreHotnessProtector doesn't actually do any checks for actual load on the RS. You could have a totally idle RegionServer and submit a single batch of 100 Puts with 101 columns each – if you don't have at least 5 retries configured, the batch will fail.

Back to ReplicationSink, the default batch size is 5000 Puts and the default retries is 4. For a table with wide rows (which might cause replication to try to sink Puts with more than 100 columns), it becomes basically impossible to replicate because the number of retries is not nearly enough to move through a batch of up to 5000, 20 at a time.

--
This message was sent by Atlassian Jira
(v8.20.1#820001)