You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chenglei (Jira)" <ji...@apache.org> on 2022/04/30 03:40:00 UTC
[jira] [Commented] (HBASE-26960) Another case for unnecessary replication suspending in RegionReplicationSink

    [ https://issues.apache.org/jira/browse/HBASE-26960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530316#comment-17530316 ] 

chenglei commented on HBASE-26960:
----------------------------------

Pushed to master ,  [~zhangduo],  thank you very much for code review.

> Another case for unnecessary replication suspending in RegionReplicationSink
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-26960
>                 URL: https://issues.apache.org/jira/browse/HBASE-26960
>             Project: HBase
>          Issue Type: Bug
>          Components: read replicas
>    Affects Versions: 3.0.0-alpha-2
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>
> Besides HBASE-26768, there is another case replication  in {{RegionReplicationSink}} would be suspend:
> For {{RegionReplicationSink}}, when there is a replication error , {{RegionReplicationSink}} invokes {{MemStoreFlusher#requestFlush}} to request a flush, and after receiving the {{FlushAction#START_FLUSH}} or {{FlushAction#CANNOT_FLUSH}} flush marker, it would resume the replication. But when {{MemStoreFlusher}}  flushing, it invokes following method {{HRegion.flushcache}} with the {{writeFlushRequestWalMarker}} set to false:
> {code:java}
>   public FlushResultImpl flushcache(List<byte[]> families,
>       boolean writeFlushRequestWalMarker, FlushLifeCycleTracker tracker) throws IOException {
>  }
> {code}
> When  {{writeFlushRequestWalMarker}} is set to false, {{HRegion.flushcache}} does not write the {{FlushAction#CANNOT_FLUSH}} flush marker to {{WAL}} when the memstore is empty, just as following {{HRegion.writeFlushRequestMarkerToWAL}} illustrated:
> {code:java}
> private boolean writeFlushRequestMarkerToWAL(WAL wal, boolean writeFlushWalMarker) {
>     if (writeFlushWalMarker && wal != null && !writestate.readOnly) {
>       FlushDescriptor desc = ProtobufUtil.toFlushDescriptor(FlushAction.CANNOT_FLUSH,
>         getRegionInfo(), -1, new TreeMap<>(Bytes.BYTES_COMPARATOR));
>       try {
>         WALUtil.writeFlushMarker(wal, this.getReplicationScope(), getRegionInfo(), desc, true, mvcc,
>           regionReplicationSink.orElse(null));
>         return true;
>       } catch (IOException e) {
>         LOG.warn(getRegionInfo().getEncodedName() + " : " +
>           "Received exception while trying to write the flush request to wal", e);
>       }
>     }
>     return false;
>   }
> {code}
> so when there is a replication error when the memstore is empty(eg. replicating the {{FlushAction#START_FLUSH}}  or {{FlushAction#COMMIT_FLUSH}} ), the replication may suspend until next memstore flush,even though later there are user writes and it could replicate normally.
> I simulate this problem in the PR , and for {{writeFlushRequestWalMarker}} paramter, it is introduced by HBASE-11580 and just only determines whether or not writing the {{FlushAction#CANNOT_FLUSH}} flush marker to WAL when the memstore is empty, so I think for simplicity, we could set it to true always for {{MemStoreFlusher}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)