You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2011/03/24 19:05:34 UTC
HBase Replication questions
Hello,
We are looking into HBase replication to separate our clients'-facing HBase
cluster and the one we need to run analytics against (likely heavy MR jobs +
potentially big scans).
1. How long does it take for edits to be propagated to a slave cluster?
As far as I understand from HBase Replication page
(http://hbase.apache.org/replication.html) there's a separate buffer held by
each region server which accumulates data (edits which should be replicated from
the edit log) before sending to Slave cluster's RSs. So basically data are sent
to slave cluster when:
* buffer is full. Is there an option to configure its size (as a way to affect
flushing frequency)?
* the end of edit log is reached by this "working thread". Does thread process
the edit log periodically or is it watching for edit log to change and acts
"immediately"? If the former, what is the default interval between runs? Can it
be configured?
2. How reliable is replication?
It looks like when there are some networking issues and slave cluster can't be
reached, this is handled gracefully by replication mechanism. It sounds like
this should also cover slave cluster going down for some reason. Are there any
possible scenarios when replication can be broken?
3. Replication of existing (and possibly big) cluster after the fact.
What are the options to replicate all existing data to a new (& empty) slave
cluster if replication wasn't configured from the start and keep replicating
from that point? It seems that because edit logs on the master cluster get
cleaned this might not be possible?
Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
Re: HBase Replication questions
Posted by Jean-Daniel Cryans <jd...@apache.org>.
> So does this mean that if it's unable to replicate after some number of sleeps,
> as the ones you've listed above, it gives up trying to replicate?
No, it continues to sleep for 10 seconds forever (until it can replicate).
> OK. So sequentially restarting each RS on the master cluster should be OK and
> the replication will/should continue where it left off?
I prefer bouncing the whole cluster at the same time because of HBASE-3441.
> Right, right... http://blog.sematext.com/2011/03/11/hbase-backup-options/
> OK, so if we have a *live* cluster and then one day we decide we want to start
> replicating this cluster, we need to stop the cluster first, call CopyTable for
> each table, start the slave cluster, restart the master cluster, and replication
> should kick in and keep the 2 clusters in sync.
No:
1 - Have replication enabled on the cluster.
2 - Start the replication to a slave.
3 - Make down the current timestamp.
4 - Start the copy table job with the upper timestamp to the one you
just got (meaning that all the data until that moment will be copied
while new data is already replicated).
5 - Repeat for every table with the same upper ts.
My prod cluster has only 1 map slot so that jobs, including CopyTable,
don't kill the performance.
> And then if the slave cluster goes down for a while one day, replication won't
> be sufficient - one will need to repeat the above procedure again, right?
No, the master won't delete hlogs that are still due to be replicated.
>
> Aha, thanks for pointing it out.
> This also means that one should really be using the latest and greatest about to
> be released HBase in order to get this fix, which is good to know.
Yeah... our setup here always lags when it comes to upgrading so some
stuff that's well tested in a previous version may be broken in a
newer one until we start deploying it and figure it out. As the
project moves on, I hope that more users and more developers will help
solve this issue as we only have so many cycles.
J-D
Re: HBase Replication questions
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,
> > 1. How long does it take for edits to be propagated to a slave cluster?
> >
> > As far as I understand from HBase Replication page
> > (http://hbase.apache.org/replication.html) there's a separate buffer held
by
> > each region server which accumulates data (edits which should be replicated
>from
> > the edit log) before sending to Slave cluster's RSs. So basically data are
>sent
> > to slave cluster when:
> > * buffer is full. Is there an option to configure its size (as a way to
>affect
> > flushing frequency)?
> > * the end of edit log is reached by this "working thread". Does thread
>process
> > the edit log periodically or is it watching for edit log to change and acts
> > "immediately"? If the former, what is the default interval between runs? Can
>it
> > be configured?
>
> It acts as soon as the buffer is full or it reaches an EOF. The end of
> the file is determined by when the file was reopened *because there's
> no way to tail a file in HDFS without closing the reader, reopening
> the file and seeking to a certain position*. The end result is that
> replication cannot just fill for minutes before sending because it
> gets the EOF pretty quickly. Our replication stream almost always have
> sub-second lag. Only if it reaches the end and it didn't read anything
> new that it will wait.
>
> Configurations:
> replication.source.size.capacity, default is 64MB but recently I saw
> some OOMEs issue and I'm starting to think that this is too big.
> replication.source.nb.capacity, default is 25k. The buffer is flushed
> when either size or capacity is reached. I'm thinking of deleting this
> second config because what's really important is the size.
> replication.source.maxretriesmultiplier, default is 10, so it retries
> up to 10 times with pauses that are currentIteration times
> replication.source.sleepforretries. By default it sleeps 1 sec, 2, 3,
> 4... 9, 10, 10, 10, 10 until it's able to replicate
> replication.source.sleepforretries, default is 1 second, see above.
So does this mean that if it's unable to replicate after some number of sleeps,
as the ones you've listed above, it gives up trying to replicate?
> > 2. How reliable is replication?
> >
> > It looks like when there are some networking issues and slave cluster can't
>be
> > reached, this is handled gracefully by replication mechanism. It sounds
like
> > this should also cover slave cluster going down for some reason. Are there
>any
> > possible scenarios when replication can be broken?
>
> The biggest issue at the moment is (from the replication
> documentation): HBASE-3130, the master cluster needs to be restarted
> if its region servers lose their session with a slave cluster
OK. So sequentially restarting each RS on the master cluster should be OK and
the replication will/should continue where it left off?
> Also reliability in general in 0.90 has went down a bit because we
> were using 0.89 for a long time and just recently started using
> 0.90.1... there's still a few bugs I'm hunting.
>
> >
> > 3. Replication of existing (and possibly big) cluster after the fact.
> >
> > What are the options to replicate all existing data to a new (& empty)
slave
> > cluster if replication wasn't configured from the start and keep
replicating
> > from that point? It seems that because edit logs on the master cluster get
> > cleaned this might not be possible?
>
> From the FAQ at the end of the replication documentation:
>
> Q. You need a bulk edit shipper? Something that allows you transfer
> 64MB of edits in one go?
>
> A. You can use the HBase-provided utility called CopyTable from the
> package org.apache.hadoop.hbase.mapreduce in order to have a
> discp-like tool to bulk copy data.
Right, right... http://blog.sematext.com/2011/03/11/hbase-backup-options/
OK, so if we have a *live* cluster and then one day we decide we want to start
replicating this cluster, we need to stop the cluster first, call CopyTable for
each table, start the slave cluster, restart the master cluster, and replication
should kick in and keep the 2 clusters in sync.
And then if the slave cluster goes down for a while one day, replication won't
be sufficient - one will need to repeat the above procedure again, right?
> But in 0.90 there's a bug with TableOutputFormat that prevents from
> using CopyTable across clusters, HBASE-3497, for which I'm at this
> very moment fixing and testing.
Aha, thanks for pointing it out.
This also means that one should really be using the latest and greatest about to
be released HBase in order to get this fix, which is good to know.
Thanks!
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
Re: HBase Replication questions
Posted by Jean-Daniel Cryans <jd...@apache.org>.
Inline.
Also if you think any of my answers should be part of the
documentation, feel free to open a jira with a patch :)
J-D
On Thu, Mar 24, 2011 at 11:05 AM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Hello,
>
> We are looking into HBase replication to separate our clients'-facing HBase
> cluster and the one we need to run analytics against (likely heavy MR jobs +
> potentially big scans).
>
> 1. How long does it take for edits to be propagated to a slave cluster?
>
> As far as I understand from HBase Replication page
> (http://hbase.apache.org/replication.html) there's a separate buffer held by
> each region server which accumulates data (edits which should be replicated from
> the edit log) before sending to Slave cluster's RSs. So basically data are sent
> to slave cluster when:
> * buffer is full. Is there an option to configure its size (as a way to affect
> flushing frequency)?
> * the end of edit log is reached by this "working thread". Does thread process
> the edit log periodically or is it watching for edit log to change and acts
> "immediately"? If the former, what is the default interval between runs? Can it
> be configured?
It acts as soon as the buffer is full or it reaches an EOF. The end of
the file is determined by when the file was reopened *because there's
no way to tail a file in HDFS without closing the reader, reopening
the file and seeking to a certain position*. The end result is that
replication cannot just fill for minutes before sending because it
gets the EOF pretty quickly. Our replication stream almost always have
sub-second lag. Only if it reaches the end and it didn't read anything
new that it will wait.
Configurations:
replication.source.size.capacity, default is 64MB but recently I saw
some OOMEs issue and I'm starting to think that this is too big.
replication.source.nb.capacity, default is 25k. The buffer is flushed
when either size or capacity is reached. I'm thinking of deleting this
second config because what's really important is the size.
replication.source.maxretriesmultiplier, default is 10, so it retries
up to 10 times with pauses that are currentIteration times
replication.source.sleepforretries. By default it sleeps 1 sec, 2, 3,
4... 9, 10, 10, 10, 10 until it's able to replicate
replication.source.sleepforretries, default is 1 second, see above.
>
> 2. How reliable is replication?
>
> It looks like when there are some networking issues and slave cluster can't be
> reached, this is handled gracefully by replication mechanism. It sounds like
> this should also cover slave cluster going down for some reason. Are there any
> possible scenarios when replication can be broken?
The biggest issue at the moment is (from the replication
documentation): HBASE-3130, the master cluster needs to be restarted
if its region servers lose their session with a slave cluster
Also reliability in general in 0.90 has went down a bit because we
were using 0.89 for a long time and just recently started using
0.90.1... there's still a few bugs I'm hunting.
>
> 3. Replication of existing (and possibly big) cluster after the fact.
>
> What are the options to replicate all existing data to a new (& empty) slave
> cluster if replication wasn't configured from the start and keep replicating
> from that point? It seems that because edit logs on the master cluster get
> cleaned this might not be possible?
>From the FAQ at the end of the replication documentation:
Q. You need a bulk edit shipper? Something that allows you transfer
64MB of edits in one go?
A. You can use the HBase-provided utility called CopyTable from the
package org.apache.hadoop.hbase.mapreduce in order to have a
discp-like tool to bulk copy data.
But in 0.90 there's a bug with TableOutputFormat that prevents from
using CopyTable across clusters, HBASE-3497, for which I'm at this
very moment fixing and testing.