You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Cliff <cl...@gmail.com> on 2014/11/22 05:32:11 UTC

Questions about HBase replication

1.
Why does "HBase replication" need replicationSink?
I think replicationSource can do replicationSink's work as well.
And if we don't use replicationSink, we just need one time I/O.

2.
The queue added HLog path in replicationSource is PriorityBlockingQueue.
If the queue is full, HLog path cannot be added to the queue.
How to deal with the situation?




--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Questions-about-HBase-replication-tp4066266.html
Sent from the HBase Developer mailing list archive at Nabble.com.

Re: Questions about HBase replication

Posted by Jean-Daniel Cryans <jd...@apache.org>.
To add to what Jieshan said:

On Fri, Nov 21, 2014 at 8:32 PM, Cliff <cl...@gmail.com> wrote:

> 1.
> Why does "HBase replication" need replicationSink?
> I think replicationSource can do replicationSink's work as well.
> And if we don't use replicationSink, we just need one time I/O.
>

If you were to use HTable from the source:

- All your meta lookups would be a lot slower than if you were local. We
rely on this to be extremely fast.

- You would be sending at least as many RPCs, but probably more since
you'll be sending them directly to each region server on the slave side,
chunked up by table. More, tinier RPCs probably isn't what you want over
WAN.

 - BTW sending one big batch can also make RPC compression more efficient.

- Retries would be done over the WAN. For example, you're regularly sending
2MB batches to a region and then it moves. The first batch that gets sent
after the move will go to where you think the region is, only to get a
NSRE. You'll then do a meta lookup to find the new location, again over the
WAN, and send those 2MBs again to the new location. It's a lot of back and
forth you'd rather do in a LAN.

Hope this helps,

J-D

RE: Questions about HBase replication

Posted by Cliff <cl...@gmail.com>.
Thank you for your answers! 




--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Questions-about-HBase-replication-tp4066266p4066274.html
Sent from the HBase Developer mailing list archive at Nabble.com.

RE: Questions about HBase replication

Posted by Bijieshan <bi...@huawei.com>.
>But even with replicationSink, replicationSource still need to use RPC to
>ship entries.
>So, the potential problem you said may still occur.

Replication has its own handler, which is different from normal  handlers.
You can check it from code:)

Jieshan.
________________________________________
From: Cliff [cliffcheng411@gmail.com]
Sent: Monday, November 24, 2014 8:08 PM
To: hbase-dev@hadoop.apache.org
Subject: RE: Questions about HBase replication

1.
But even with replicationSink, replicationSource still need to use RPC to
ship entries.
So, the potential problem you said may still occur.
What's the most important reason for using replicationSink even with two
time I/O?
Thank you.




--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Questions-about-HBase-replication-tp4066266p4066293.html
Sent from the HBase Developer mailing list archive at Nabble.com.

RE: Questions about HBase replication

Posted by Cliff <cl...@gmail.com>.
1.
But even with replicationSink, replicationSource still need to use RPC to
ship entries.
So, the potential problem you said may still occur.
What's the most important reason for using replicationSink even with two
time I/O?
Thank you.




--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Questions-about-HBase-replication-tp4066266p4066293.html
Sent from the HBase Developer mailing list archive at Nabble.com.

RE: Questions about HBase replication

Posted by Bijieshan <bi...@huawei.com>.
>1.
>Why does "HBase replication" need replicationSink?
>I think replicationSource can do replicationSink's work as well.
>And if we don't use replicationSink, we just need one time I/O.

   ReplicationSink used to apply all HLog edits to peer cluster. If we remove ReplicationSink from current architecture, so how to send data to peer cluster? Using API "HTableInterface#put"?
   So there's no difference between replication write requests and normal user write requests. One potential problem is that the replication write request may occupy all the handler threads on RegionServer, then affect normal user write requests.

>2.
>The queue added HLog path in replicationSource is PriorityBlockingQueue.
>If the queue is full, HLog path cannot be added to the queue.
>How to deal with the situation?

   We didn't limit the maximum size of that queue, right?

Jieshan.
________________________________________
From: Cliff [cliffcheng411@gmail.com]
Sent: Saturday, November 22, 2014 12:32 PM
To: hbase-dev@hadoop.apache.org
Subject: Questions about HBase replication

1.
Why does "HBase replication" need replicationSink?
I think replicationSource can do replicationSink's work as well.
And if we don't use replicationSink, we just need one time I/O.

2.
The queue added HLog path in replicationSource is PriorityBlockingQueue.
If the queue is full, HLog path cannot be added to the queue.
How to deal with the situation?




--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Questions-about-HBase-replication-tp4066266.html
Sent from the HBase Developer mailing list archive at Nabble.com.