You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chenglei (Jira)" <ji...@apache.org> on 2022/04/01 03:55:00 UTC

[jira] [Updated] (HBASE-26811) Secondary replica may be disabled for read incorrectly forever

     [ https://issues.apache.org/jira/browse/HBASE-26811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chenglei updated HBASE-26811:
-----------------------------
    Fix Version/s: 2.5.0
                   2.6.0
                   3.0.0-alpha-3
                   2.4.12
       Resolution: Fixed
           Status: Resolved  (was: Patch Available)

> Secondary replica may be disabled for read incorrectly forever
> --------------------------------------------------------------
>
>                 Key: HBASE-26811
>                 URL: https://issues.apache.org/jira/browse/HBASE-26811
>             Project: HBase
>          Issue Type: Bug
>          Components: read replicas
>    Affects Versions: 3.0.0-alpha-2, 2.4.10
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>             Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> For read replica, when I set {{hbase.region.replica.wait.for.primary.flush}} to false, and set {{TableDescriptorBuilder.setRegionMemStoreReplication}} to true explicitly at table level, the secondary replica would be disabled for read, reading on this replica region would throw :
> {code:java}
> java.io.IOException:  The region's reads are disabled. Cannot serve the request
> 	at org.apache.hadoop.hbase.regionserver.HRegion.checkReadsEnabled(HRegion.java:5187)
> 	at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8279)
> {code}
> Very strange, if I don't set {{TableDescriptorBuilder.setRegionMemStoreReplication}} to true explicitly (which default value is true), the secondary replica is normal.
> This problem is because when set {{hbase.region.replica.wait.for.primary.flush}} to false, the {{HRegionServer.startServices}} would not create the {{ExecutorType.RS_REGION_REPLICA_FLUSH_OPS}} for {{RegionReplicaFlushHandler}} at HRegionServer-level:
> {code:java}
>      if (ServerRegionReplicaUtil.isRegionReplicaWaitForPrimaryFlushEnabled(conf)) {
>       final int regionReplicaFlushThreads = conf.getInt(
>           "hbase.regionserver.region.replica.flusher.threads", conf.getInt(
>               "hbase.regionserver.executor.openregion.threads", 3));
>       executorService.startExecutorService(executorService.new ExecutorConfig().setExecutorType(
>           ExecutorType.RS_REGION_REPLICA_FLUSH_OPS).setCorePoolSize(regionReplicaFlushThreads));
>     }
> {code}
> but when I set {{TableDescriptorBuilder.setRegionMemStoreReplication}} to true explicitly, it also set {{hbase.region.replica.wait.for.primary.flush}} to true at table-level(there is no public {{hbase.region.replica.wait.for.primary.flush} config for hbase user at table-level):
> {code:java}
> public ModifyableTableDescriptor setRegionMemStoreReplication(boolean memstoreReplication) {
>       setValue(REGION_MEMSTORE_REPLICATION_KEY, Boolean.toString(memstoreReplication));
>       // If the memstore replication is setup, we do not have to wait for observing a flush event
>       // from primary before starting to serve reads, because gaps from replication is not applicable
>       return setValue(REGION_REPLICA_WAIT_FOR_PRIMARY_FLUSH_CONF_KEY,
>               Boolean.toString(memstoreReplication));
>     }
> {code}
> So when the secondary replica region is open,{{HRegionServer.triggerFlushInPrimaryRegion}} is invoked for this region, because {{hbase.region.replica.wait.for.primary.flush}} to true at table-level, the line 2234 is skipped, secondary replica is disabled for read at line 2238, but there is no {{ExecutorType.RS_REGION_REPLICA_FLUSH_OPS}} for {{RegionReplicaFlushHandler}}  at HRegionServer-level, so line 2243 would not schedule {{RegionReplicaFlushHandler}}, the secondary replica would be disabled for read.
> {code:java}
> 2227  private void triggerFlushInPrimaryRegion(final HRegion region) {
>                ...
> 2232      if (!ServerRegionReplicaUtil.isRegionReplicaReplicationEnabled(region.conf, tn) ||
> 2233           !ServerRegionReplicaUtil.isRegionReplicaWaitForPrimaryFlushEnabled(region.conf)) {
> 2234            region.setReadsEnabled(true);
> 2235            return;
> 2236       }
> 2237
> 2238      region.setReadsEnabled(false); // disable reads before marking the region as opened.
> 2239      // RegionReplicaFlushHandler might reset this.
> 2240
> 2241      // Submit it to be handled by one of the handlers so that we do not block OpenRegionHandler
> 2242     if (this.executorService != null) {
> 2243        this.executorService.submit(new RegionReplicaFlushHandler(this, region));
> 2244     } else {
>                 ...
> {code}
> I think for above {{ModifyableTableDescriptor.setRegionMemStoreReplication}}, when set it to true, there is no reason to also set the {{hbase.region.replica.wait.for.primary.flush}} to true at table-level.
> This problem may be more serious on master because the new replication framework (HBASE-26233) does not enable the secondary replica read when receives the flush marker, that is to say, the secondary replica read would be disabled for read forever. 
> But for branch-2, the secondary replica would be enabled for read when receives the flush marker.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)