You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by hdev ml <hd...@gmail.com> on 2013/10/17 21:52:49 UTC

HBase backup, recovery, replication et al

Hello all,

We are looking at a solution for HBase backup, recovery and replication for
DR

We did take a look at the HBase replication, but we are not sure whether it
is being used at large.

Our data size in HBase is around 4TB.

We were thinking of DB approach of Exporting Full Dump weekly and then
doing incremental exports on regular intervals, say around 2-3 times a day.
But soon realized that the data transfer of 4 TB to our DR site, with our
current bandwidth, will take around 100+ hours.

Are there better solutions out there? What do large installations do?

Any documentation?

Please let me know

Thanks
Harshad

Re: HBase backup, recovery, replication et al

Posted by hdev ml <hd...@gmail.com>.
Thanks Dave for your insights. will wait for the patch from CDH.

Meanwhile will try the replication on our clusters.



On Thu, Oct 17, 2013 at 3:06 PM, Dave Latham <la...@davelink.net> wrote:

> Primarily the link above, but also got comfortable with the source code
> after some bumps.  You may want to consider moving to a more recent version
> depending how much volume you're expecting to deal with.  Here are a few
> issues we bumped into and fixed since your release: HBASE-8096, HBASE-8806,
> HBASE-9377
>
> Dave
>
>
> On Thu, Oct 17, 2013 at 2:52 PM, hdev ml <hd...@gmail.com> wrote:
>
> > Thanks Dave. Yes we are planning to take exports every now and then for
> > safeguarding the data and also snapshots for local corruptions, data loss
> > etc.
> >
> > Did you refer to any documentation other than the "hbase book"?
> >
> > We are on CDH4.4 - HBase 0.94.6, so I think we are good there.
> >
> > Thanks for your time Dave.
> >
> > Harshad
> >
> >
> > On Thu, Oct 17, 2013 at 2:39 PM, Dave Latham <la...@davelink.net>
> wrote:
> >
> > > We're running HBase replication successfully on a 500 TB (compressed -
> > raw
> > > is about 2PB) cluster over a 60ms link across the country.  I'd give
> it a
> > > thumbs up for dealing with loss of a cluster and being able to run
> > > applications in two places that can tolerate inconsistency from the
> > > asynchronous nature. ( http://hbase.apache.org/replication.html )
> > >
> > > You'll still want some sort of snapshot / export to be able to recover
> > from
> > > bugs / corruption which gets replicated.  We're intending to try out
> > hbase
> > > snapshots ( http://hbase.apache.org/book/ops.snapshots.html ) once
> we've
> > > deployed a release with support.
> > >
> > > I'd also recommend using a recent 0.94 release if possible.
> > >
> > > Dave
> > >
> > >
> > > On Thu, Oct 17, 2013 at 12:52 PM, hdev ml <hd...@gmail.com> wrote:
> > >
> > > > Hello all,
> > > >
> > > > We are looking at a solution for HBase backup, recovery and
> replication
> > > for
> > > > DR
> > > >
> > > > We did take a look at the HBase replication, but we are not sure
> > whether
> > > it
> > > > is being used at large.
> > > >
> > > > Our data size in HBase is around 4TB.
> > > >
> > > > We were thinking of DB approach of Exporting Full Dump weekly and
> then
> > > > doing incremental exports on regular intervals, say around 2-3 times
> a
> > > day.
> > > > But soon realized that the data transfer of 4 TB to our DR site, with
> > our
> > > > current bandwidth, will take around 100+ hours.
> > > >
> > > > Are there better solutions out there? What do large installations do?
> > > >
> > > > Any documentation?
> > > >
> > > > Please let me know
> > > >
> > > > Thanks
> > > > Harshad
> > > >
> > >
> >
>

Re: HBase backup, recovery, replication et al

Posted by Dave Latham <la...@davelink.net>.
Primarily the link above, but also got comfortable with the source code
after some bumps.  You may want to consider moving to a more recent version
depending how much volume you're expecting to deal with.  Here are a few
issues we bumped into and fixed since your release: HBASE-8096, HBASE-8806,
HBASE-9377

Dave


On Thu, Oct 17, 2013 at 2:52 PM, hdev ml <hd...@gmail.com> wrote:

> Thanks Dave. Yes we are planning to take exports every now and then for
> safeguarding the data and also snapshots for local corruptions, data loss
> etc.
>
> Did you refer to any documentation other than the "hbase book"?
>
> We are on CDH4.4 - HBase 0.94.6, so I think we are good there.
>
> Thanks for your time Dave.
>
> Harshad
>
>
> On Thu, Oct 17, 2013 at 2:39 PM, Dave Latham <la...@davelink.net> wrote:
>
> > We're running HBase replication successfully on a 500 TB (compressed -
> raw
> > is about 2PB) cluster over a 60ms link across the country.  I'd give it a
> > thumbs up for dealing with loss of a cluster and being able to run
> > applications in two places that can tolerate inconsistency from the
> > asynchronous nature. ( http://hbase.apache.org/replication.html )
> >
> > You'll still want some sort of snapshot / export to be able to recover
> from
> > bugs / corruption which gets replicated.  We're intending to try out
> hbase
> > snapshots ( http://hbase.apache.org/book/ops.snapshots.html ) once we've
> > deployed a release with support.
> >
> > I'd also recommend using a recent 0.94 release if possible.
> >
> > Dave
> >
> >
> > On Thu, Oct 17, 2013 at 12:52 PM, hdev ml <hd...@gmail.com> wrote:
> >
> > > Hello all,
> > >
> > > We are looking at a solution for HBase backup, recovery and replication
> > for
> > > DR
> > >
> > > We did take a look at the HBase replication, but we are not sure
> whether
> > it
> > > is being used at large.
> > >
> > > Our data size in HBase is around 4TB.
> > >
> > > We were thinking of DB approach of Exporting Full Dump weekly and then
> > > doing incremental exports on regular intervals, say around 2-3 times a
> > day.
> > > But soon realized that the data transfer of 4 TB to our DR site, with
> our
> > > current bandwidth, will take around 100+ hours.
> > >
> > > Are there better solutions out there? What do large installations do?
> > >
> > > Any documentation?
> > >
> > > Please let me know
> > >
> > > Thanks
> > > Harshad
> > >
> >
>

Re: HBase backup, recovery, replication et al

Posted by hdev ml <hd...@gmail.com>.
Thanks Dave. Yes we are planning to take exports every now and then for
safeguarding the data and also snapshots for local corruptions, data loss
etc.

Did you refer to any documentation other than the "hbase book"?

We are on CDH4.4 - HBase 0.94.6, so I think we are good there.

Thanks for your time Dave.

Harshad


On Thu, Oct 17, 2013 at 2:39 PM, Dave Latham <la...@davelink.net> wrote:

> We're running HBase replication successfully on a 500 TB (compressed - raw
> is about 2PB) cluster over a 60ms link across the country.  I'd give it a
> thumbs up for dealing with loss of a cluster and being able to run
> applications in two places that can tolerate inconsistency from the
> asynchronous nature. ( http://hbase.apache.org/replication.html )
>
> You'll still want some sort of snapshot / export to be able to recover from
> bugs / corruption which gets replicated.  We're intending to try out hbase
> snapshots ( http://hbase.apache.org/book/ops.snapshots.html ) once we've
> deployed a release with support.
>
> I'd also recommend using a recent 0.94 release if possible.
>
> Dave
>
>
> On Thu, Oct 17, 2013 at 12:52 PM, hdev ml <hd...@gmail.com> wrote:
>
> > Hello all,
> >
> > We are looking at a solution for HBase backup, recovery and replication
> for
> > DR
> >
> > We did take a look at the HBase replication, but we are not sure whether
> it
> > is being used at large.
> >
> > Our data size in HBase is around 4TB.
> >
> > We were thinking of DB approach of Exporting Full Dump weekly and then
> > doing incremental exports on regular intervals, say around 2-3 times a
> day.
> > But soon realized that the data transfer of 4 TB to our DR site, with our
> > current bandwidth, will take around 100+ hours.
> >
> > Are there better solutions out there? What do large installations do?
> >
> > Any documentation?
> >
> > Please let me know
> >
> > Thanks
> > Harshad
> >
>

Re: HBase backup, recovery, replication et al

Posted by Dave Latham <la...@davelink.net>.
We're running HBase replication successfully on a 500 TB (compressed - raw
is about 2PB) cluster over a 60ms link across the country.  I'd give it a
thumbs up for dealing with loss of a cluster and being able to run
applications in two places that can tolerate inconsistency from the
asynchronous nature. ( http://hbase.apache.org/replication.html )

You'll still want some sort of snapshot / export to be able to recover from
bugs / corruption which gets replicated.  We're intending to try out hbase
snapshots ( http://hbase.apache.org/book/ops.snapshots.html ) once we've
deployed a release with support.

I'd also recommend using a recent 0.94 release if possible.

Dave


On Thu, Oct 17, 2013 at 12:52 PM, hdev ml <hd...@gmail.com> wrote:

> Hello all,
>
> We are looking at a solution for HBase backup, recovery and replication for
> DR
>
> We did take a look at the HBase replication, but we are not sure whether it
> is being used at large.
>
> Our data size in HBase is around 4TB.
>
> We were thinking of DB approach of Exporting Full Dump weekly and then
> doing incremental exports on regular intervals, say around 2-3 times a day.
> But soon realized that the data transfer of 4 TB to our DR site, with our
> current bandwidth, will take around 100+ hours.
>
> Are there better solutions out there? What do large installations do?
>
> Any documentation?
>
> Please let me know
>
> Thanks
> Harshad
>