You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2022/04/12 21:19:00 UTC

[jira] [Commented] (HBASE-26950) Use AsyncConnection in ReplicationSink

    [ https://issues.apache.org/jira/browse/HBASE-26950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521341#comment-17521341 ] 

Bryan Beaudreault commented on HBASE-26950:
-------------------------------------------

Not planning on working on this right this moment, but I took a quick look. It's straightforward to convert ReplicationSink itself, just {{.join()}} on the futures where necessary. The complication is in replicated bulk loads, where there are lots of dependencies on blocking Table and Connection in LoadIncrementalHFiles. The easiest thing might be to maintain 2 separate connections, async for batch calls and sync for hfiles.

> Use AsyncConnection in ReplicationSink
> --------------------------------------
>
>                 Key: HBASE-26950
>                 URL: https://issues.apache.org/jira/browse/HBASE-26950
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.4.11
>            Reporter: Bryan Beaudreault
>            Priority: Major
>
> We don't need to necessarily rewrite ReplicationSink to work fully async. I think it would simply benefit from ConnectionFactory.createAsyncConnection instead of ConnectionFactory.createConnection.
> The reasons for this are:
>  * AsyncConnection is the more modern implementation, the only implementation in master, and where most of the efforts will be going forward.
>  * ReplicationSink only does batch calls, and batch calls are done with AsyncProcess. It's likely that the native AsyncTable is better than AsyncProcess for this.
>  ** One specific example, AsyncProcess calls findAllLocationsOrFail sequentially for all actions in a batch. This can take quite a while with the default replication batch size of 5k, if actions are spread across many regions. In AsyncTable, these calls are done in parallel



--
This message was sent by Atlassian Jira
(v8.20.1#820001)