You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chenglei (Jira)" <ji...@apache.org> on 2022/04/01 03:55:00 UTC
[jira] [Updated] (HBASE-26811) Secondary replica may be disabled for read incorrectly forever
[ https://issues.apache.org/jira/browse/HBASE-26811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated HBASE-26811:
-----------------------------
Fix Version/s: 2.5.0
2.6.0
3.0.0-alpha-3
2.4.12
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Secondary replica may be disabled for read incorrectly forever
> --------------------------------------------------------------
>
> Key: HBASE-26811
> URL: https://issues.apache.org/jira/browse/HBASE-26811
> Project: HBase
> Issue Type: Bug
> Components: read replicas
> Affects Versions: 3.0.0-alpha-2, 2.4.10
> Reporter: chenglei
> Assignee: chenglei
> Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> For read replica, when I set {{hbase.region.replica.wait.for.primary.flush}} to false, and set {{TableDescriptorBuilder.setRegionMemStoreReplication}} to true explicitly at table level, the secondary replica would be disabled for read, reading on this replica region would throw :
> {code:java}
> java.io.IOException: The region's reads are disabled. Cannot serve the request
> at org.apache.hadoop.hbase.regionserver.HRegion.checkReadsEnabled(HRegion.java:5187)
> at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8279)
> {code}
> Very strange, if I don't set {{TableDescriptorBuilder.setRegionMemStoreReplication}} to true explicitly (which default value is true), the secondary replica is normal.
> This problem is because when set {{hbase.region.replica.wait.for.primary.flush}} to false, the {{HRegionServer.startServices}} would not create the {{ExecutorType.RS_REGION_REPLICA_FLUSH_OPS}} for {{RegionReplicaFlushHandler}} at HRegionServer-level:
> {code:java}
> if (ServerRegionReplicaUtil.isRegionReplicaWaitForPrimaryFlushEnabled(conf)) {
> final int regionReplicaFlushThreads = conf.getInt(
> "hbase.regionserver.region.replica.flusher.threads", conf.getInt(
> "hbase.regionserver.executor.openregion.threads", 3));
> executorService.startExecutorService(executorService.new ExecutorConfig().setExecutorType(
> ExecutorType.RS_REGION_REPLICA_FLUSH_OPS).setCorePoolSize(regionReplicaFlushThreads));
> }
> {code}
> but when I set {{TableDescriptorBuilder.setRegionMemStoreReplication}} to true explicitly, it also set {{hbase.region.replica.wait.for.primary.flush}} to true at table-level(there is no public {{hbase.region.replica.wait.for.primary.flush} config for hbase user at table-level):
> {code:java}
> public ModifyableTableDescriptor setRegionMemStoreReplication(boolean memstoreReplication) {
> setValue(REGION_MEMSTORE_REPLICATION_KEY, Boolean.toString(memstoreReplication));
> // If the memstore replication is setup, we do not have to wait for observing a flush event
> // from primary before starting to serve reads, because gaps from replication is not applicable
> return setValue(REGION_REPLICA_WAIT_FOR_PRIMARY_FLUSH_CONF_KEY,
> Boolean.toString(memstoreReplication));
> }
> {code}
> So when the secondary replica region is open,{{HRegionServer.triggerFlushInPrimaryRegion}} is invoked for this region, because {{hbase.region.replica.wait.for.primary.flush}} to true at table-level, the line 2234 is skipped, secondary replica is disabled for read at line 2238, but there is no {{ExecutorType.RS_REGION_REPLICA_FLUSH_OPS}} for {{RegionReplicaFlushHandler}} at HRegionServer-level, so line 2243 would not schedule {{RegionReplicaFlushHandler}}, the secondary replica would be disabled for read.
> {code:java}
> 2227 private void triggerFlushInPrimaryRegion(final HRegion region) {
> ...
> 2232 if (!ServerRegionReplicaUtil.isRegionReplicaReplicationEnabled(region.conf, tn) ||
> 2233 !ServerRegionReplicaUtil.isRegionReplicaWaitForPrimaryFlushEnabled(region.conf)) {
> 2234 region.setReadsEnabled(true);
> 2235 return;
> 2236 }
> 2237
> 2238 region.setReadsEnabled(false); // disable reads before marking the region as opened.
> 2239 // RegionReplicaFlushHandler might reset this.
> 2240
> 2241 // Submit it to be handled by one of the handlers so that we do not block OpenRegionHandler
> 2242 if (this.executorService != null) {
> 2243 this.executorService.submit(new RegionReplicaFlushHandler(this, region));
> 2244 } else {
> ...
> {code}
> I think for above {{ModifyableTableDescriptor.setRegionMemStoreReplication}}, when set it to true, there is no reason to also set the {{hbase.region.replica.wait.for.primary.flush}} to true at table-level.
> This problem may be more serious on master because the new replication framework (HBASE-26233) does not enable the secondary replica read when receives the flush marker, that is to say, the secondary replica read would be disabled for read forever.
> But for branch-2, the secondary replica would be enabled for read when receives the flush marker.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)