You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2014/12/16 17:33:34 UTC

Re: Questions about HBase replication

To add to what Jieshan said:

On Fri, Nov 21, 2014 at 8:32 PM, Cliff <cl...@gmail.com> wrote:

> 1.
> Why does "HBase replication" need replicationSink?
> I think replicationSource can do replicationSink's work as well.
> And if we don't use replicationSink, we just need one time I/O.
>

If you were to use HTable from the source:

- All your meta lookups would be a lot slower than if you were local. We
rely on this to be extremely fast.

- You would be sending at least as many RPCs, but probably more since
you'll be sending them directly to each region server on the slave side,
chunked up by table. More, tinier RPCs probably isn't what you want over
WAN.

 - BTW sending one big batch can also make RPC compression more efficient.

- Retries would be done over the WAN. For example, you're regularly sending
2MB batches to a region and then it moves. The first batch that gets sent
after the move will go to where you think the region is, only to get a
NSRE. You'll then do a meta lookup to find the new location, again over the
WAN, and send those 2MBs again to the new location. It's a lot of back and
forth you'd rather do in a LAN.

Hope this helps,

J-D