You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Kevin Su <ks...@brightroll.com> on 2013/11/12 01:03:18 UTC

HBase Replication Progress

Hi,

I am having trouble searching for answers regarding HBase replication, so I
thought I would email the mailing list.

Does HBase provide an API/way to see what has/hasn't been replicated yet?

My use case is the following:

I run a map reduce job in Cluster A and stick the output in HBase. I would
like to transport this output to Cluster B as (part of) the input to
another map reduce job. I hope to achieve this transport via HBase
replication. The second map reduce job cannot start until all the data from
Cluster A has been replicated to Cluster B. So what is the best way to
check if everything has been replicated? Do I query Zookeeper and check if
the RS queues are empty? Or is HBase replication not the right fit for my
use case?

I am using HBase 0.94.2.

Thanks in advance for any advice!

--
Kevin

Re: HBase Replication Progress

Posted by Ted Yu <yu...@gmail.com>.
bq.  consider to move to 0.94.6+

Moving to 0.94.7 or newer release is recommended.
See HBASE-8352

Cheers


On Mon, Nov 11, 2013 at 4:18 PM, Demai Ni <ni...@gmail.com> wrote:

> Kevin,
>
> from your requirement, I think the 'snapshot' feature with export will work
> better. Here is some info:
> http://hbase.apache.org/book/ops.snapshots.html
> to fully benefit from this feature, you may consider to move to 0.94.6+
>
> I am still curiously about this hard requirement ".. The second map reduce
> job cannot start until all the data from Cluster A has been replicated to
> Cluster B....", consider the output of the first mapreduce job will be put
> into a HBase table of ClusterA. there is no need to wait till the
> replication complete, as long as use different rowID so the 2nd output
> wont' overwrite the 1st one. HBase replication will handle the situation
> very well.
>
> Demai
>
>
> On Mon, Nov 11, 2013 at 4:03 PM, Kevin Su <ks...@brightroll.com> wrote:
>
> > Hi,
> >
> > I am having trouble searching for answers regarding HBase replication,
> so I
> > thought I would email the mailing list.
> >
> > Does HBase provide an API/way to see what has/hasn't been replicated yet?
> >
> > My use case is the following:
> >
> > I run a map reduce job in Cluster A and stick the output in HBase. I
> would
> > like to transport this output to Cluster B as (part of) the input to
> > another map reduce job. I hope to achieve this transport via HBase
> > replication. The second map reduce job cannot start until all the data
> from
> > Cluster A has been replicated to Cluster B. So what is the best way to
> > check if everything has been replicated? Do I query Zookeeper and check
> if
> > the RS queues are empty? Or is HBase replication not the right fit for my
> > use case?
> >
> > I am using HBase 0.94.2.
> >
> > Thanks in advance for any advice!
> >
> > --
> > Kevin
> >
>

Re: HBase Replication Progress

Posted by Demai Ni <ni...@gmail.com>.
Kevin,

from your requirement, I think the 'snapshot' feature with export will work
better. Here is some info:
http://hbase.apache.org/book/ops.snapshots.html
to fully benefit from this feature, you may consider to move to 0.94.6+

I am still curiously about this hard requirement ".. The second map reduce
job cannot start until all the data from Cluster A has been replicated to
Cluster B....", consider the output of the first mapreduce job will be put
into a HBase table of ClusterA. there is no need to wait till the
replication complete, as long as use different rowID so the 2nd output
wont' overwrite the 1st one. HBase replication will handle the situation
very well.

Demai


On Mon, Nov 11, 2013 at 4:03 PM, Kevin Su <ks...@brightroll.com> wrote:

> Hi,
>
> I am having trouble searching for answers regarding HBase replication, so I
> thought I would email the mailing list.
>
> Does HBase provide an API/way to see what has/hasn't been replicated yet?
>
> My use case is the following:
>
> I run a map reduce job in Cluster A and stick the output in HBase. I would
> like to transport this output to Cluster B as (part of) the input to
> another map reduce job. I hope to achieve this transport via HBase
> replication. The second map reduce job cannot start until all the data from
> Cluster A has been replicated to Cluster B. So what is the best way to
> check if everything has been replicated? Do I query Zookeeper and check if
> the RS queues are empty? Or is HBase replication not the right fit for my
> use case?
>
> I am using HBase 0.94.2.
>
> Thanks in advance for any advice!
>
> --
> Kevin
>