You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Vincent Poon (JIRA)" <ji...@apache.org> on 2017/01/05 19:59:58 UTC
[jira] [Updated] (HBASE-15995) Separate replication WAL reading from shipping

     [ https://issues.apache.org/jira/browse/HBASE-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vincent Poon updated HBASE-15995:
---------------------------------
    Attachment: HBASE-15995.master.v1.patch

This patch does two things

1)  Puts replication reading into a separate thread.  This is done by ReplicationWALEntryBatcher, which reads entries and puts them onto a queue.  ReplicationSourceWorkerThread is reduced to simply reading batches off the queue and shipping them.

2)  Puts the actual WAL entry reading logic in a WALEntryStream class.  This implements iterator.  Eventually when we have a way to stream over the network, we can get rid of the batcher above and simplify to something like
{code}
while(entryStream.hasNext()) {
  shipEntry(entryStream.next());
}
{code}

I tried to keep the rest of the logic the same as what currently exists.  We could put ReplicationSource into another class ReplicationSourceV2 if so desired.

I believe all replication tests pass except TestGlobalThrottler.  This is because one thread is currently reading a batch, and the other thread is shipping the last batch, so even if your queue holds only 1 batch, you're using double the memory.  (If I modify the test to double the threshold, it passes)

I've done performance testing by setting up a single standalone region server shipping to a remote cluster, and then running PerformanceEvaluation to generate 3gb of data.  The amount of time for replication to catch up:
ReplicationSourceV1    -    190s   (source.size.capacity of 64mb)
ReplicationSourceV2    -    160s   (source.size.capacity of 32mb, with queue size of 1 so that the max memory used should be 64mb)

There's better performance in situations where reading or filtering entries is more expensive (e.g. contention for disk/cpu).  For example, I tried introducing a 100ms delay in a custom entry filter.  
ReplicationSourceV1  -  366s
ReplicationSourceV2  -  236s


> Separate replication WAL reading from shipping
> ----------------------------------------------
>
>                 Key: HBASE-15995
>                 URL: https://issues.apache.org/jira/browse/HBASE-15995
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Replication
>    Affects Versions: 2.0.0
>            Reporter: Vincent Poon
>            Assignee: Vincent Poon
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15995.master.v1.patch
>
>
> Currently ReplicationSource reads edits from the WAL and ships them in the same thread.
> By breaking out the reading from the shipping, we can introduce greater parallelism and lay the foundation for further refactoring to a pipelined, streaming model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)