You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Timo Schaepe <ti...@timoschaepe.de> on 2013/12/23 19:53:37 UTC

Consistent Backup strategy

Hey guys,

we are searching for a consistent backup strategy with the export tool. Is this article still up-to-date and I can use it?

http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html

Thanks for answers.

cheers,

	Timo

Re: Consistent Backup strategy

Posted by lars hofhansl <la...@apache.org>.
We're doing a version of that at Salesforce (we have our own M/R jobs, but the principle is the same).
Soon we'll run the backup M/R job over a snapshot for performance reasons, but even then the principle is the same.

Specifically we're keeping 48h worth of life data in HBase itself (TTL=48h, MIN_VERSIONS=1, KEEP_DELETED_CELLS=true), and run the jobs as of 2h in the past (rounded to an exact hour boundary), every night.

I think it's time I write an updated blog post. We plan to eventually open source the tools we've written.


-- Lars



________________________________
 From: Timo Schaepe <ti...@timoschaepe.de>
To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Monday, December 23, 2013 10:53 AM
Subject: Consistent Backup strategy
 

Hey guys,

we are searching for a consistent backup strategy with the export tool. Is this article still up-to-date and I can use it?

http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html

Thanks for answers.

cheers,

    Timo

Re: Consistent Backup strategy

Posted by Matteo Bertozzi <th...@gmail.com>.
If you can rely on timestamps, you can use the Export tool as showed in the
blog post without problem. The Export cmdline interface is not changed.

Matteo



On Mon, Dec 23, 2013 at 8:03 PM, Timo Schaepe <ti...@timoschaepe.de> wrote:

> With consistent I mean it like in the mentioned blog article from Lars. We
> want to take a backup of our data in one specific time range, where the
> data is consistent in that time range.
>
> My thoughts till now:
> I want to do a full backup of our data every Saturday. During the week I
> want to take an incremental backup every 24 hours. It seems possible to do
> that with the export tool. For example, I want to take a (incremental)
> backup every midnight, I start the export tool five minutes after midnight
> with timestamps, where the starttime is 0 (or timestamp of the last backup
> if it is an incremental backup = midnight of the day before). Endtime is
> now - 5 Minutes = midnight. This data we can backup on a different
> machine/backup server.
>
> It seems, that the article of Lars meets exactly my requirement, that's
> why I'm asked, if it is still up-to-date.
>
> Taking snapshots with the snapshot tool gives me no data to pack on backup
> server, I guess.
>
> bye,
>
>         Timo
>
> Am 23.12.2013 um 11:12 schrieb Matteo Bertozzi <th...@gmail.com>:
>
> > can you define what "consistent" means to you?
> > for example online snapshots are row-consistent, but the snapshot of
> > "Region 1" may be taken at time T0 and the snapshot of "Region N" a time
> T0
> > + X seconds
> >
> > Matteo
> >
> >
> >
> > On Mon, Dec 23, 2013 at 7:07 PM, Timo Schaepe <ti...@timoschaepe.de>
> wrote:
> >
> >> Sorry, I forgot to mention. Taking the cluster offline is not an option.
> >> We need an consistent backup of an online cluster. Our plan B is to
> build a
> >> second cluster for replication and take offline snapshots from this
> cluster.
> >>
> >> bye,
> >>
> >>        Timo
> >>
> >>
> >> Am 23.12.2013 um 11:02 schrieb Vladimir Rodionov <
> vrodionov@carrieriq.com
> >>> :
> >>
> >>> Offline snapshots?
> >>>
> >>> Best regards,
> >>> Vladimir Rodionov
> >>> Principal Platform Engineer
> >>> Carrier IQ, www.carrieriq.com
> >>> e-mail: vrodionov@carrieriq.com
> >>>
> >>> ________________________________________
> >>> From: Timo Schaepe [timo@timoschaepe.de]
> >>> Sent: Monday, December 23, 2013 10:53 AM
> >>> To: user@hbase.apache.org
> >>> Subject: Consistent Backup strategy
> >>>
> >>> Hey guys,
> >>>
> >>> we are searching for a consistent backup strategy with the export tool.
> >> Is this article still up-to-date and I can use it?
> >>>
> >>>
> >>
> http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html
> >>>
> >>> Thanks for answers.
> >>>
> >>> cheers,
> >>>
> >>>       Timo
> >>>
> >>> Confidentiality Notice:  The information contained in this message,
> >> including any attachments hereto, may be confidential and is intended
> to be
> >> read only by the individual or entity to whom this message is
> addressed. If
> >> the reader of this message is not the intended recipient or an agent or
> >> designee of the intended recipient, please note that any review, use,
> >> disclosure or distribution of this message or its attachments, in any
> form,
> >> is strictly prohibited.  If you have received this message in error,
> please
> >> immediately notify the sender and/or Notifications@carrieriq.com and
> >> delete or destroy any copy of this message and its attachments.
> >>>
> >>
> >>
>
>

Re: Consistent Backup strategy

Posted by Timo Schaepe <ti...@timoschaepe.de>.
With consistent I mean it like in the mentioned blog article from Lars. We want to take a backup of our data in one specific time range, where the data is consistent in that time range.

My thoughts till now:
I want to do a full backup of our data every Saturday. During the week I want to take an incremental backup every 24 hours. It seems possible to do that with the export tool. For example, I want to take a (incremental) backup every midnight, I start the export tool five minutes after midnight with timestamps, where the starttime is 0 (or timestamp of the last backup if it is an incremental backup = midnight of the day before). Endtime is now - 5 Minutes = midnight. This data we can backup on a different machine/backup server.

It seems, that the article of Lars meets exactly my requirement, that's why I'm asked, if it is still up-to-date.

Taking snapshots with the snapshot tool gives me no data to pack on backup server, I guess.

bye,

	Timo

Am 23.12.2013 um 11:12 schrieb Matteo Bertozzi <th...@gmail.com>:

> can you define what "consistent" means to you?
> for example online snapshots are row-consistent, but the snapshot of
> "Region 1" may be taken at time T0 and the snapshot of "Region N" a time T0
> + X seconds
> 
> Matteo
> 
> 
> 
> On Mon, Dec 23, 2013 at 7:07 PM, Timo Schaepe <ti...@timoschaepe.de> wrote:
> 
>> Sorry, I forgot to mention. Taking the cluster offline is not an option.
>> We need an consistent backup of an online cluster. Our plan B is to build a
>> second cluster for replication and take offline snapshots from this cluster.
>> 
>> bye,
>> 
>>        Timo
>> 
>> 
>> Am 23.12.2013 um 11:02 schrieb Vladimir Rodionov <vrodionov@carrieriq.com
>>> :
>> 
>>> Offline snapshots?
>>> 
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: vrodionov@carrieriq.com
>>> 
>>> ________________________________________
>>> From: Timo Schaepe [timo@timoschaepe.de]
>>> Sent: Monday, December 23, 2013 10:53 AM
>>> To: user@hbase.apache.org
>>> Subject: Consistent Backup strategy
>>> 
>>> Hey guys,
>>> 
>>> we are searching for a consistent backup strategy with the export tool.
>> Is this article still up-to-date and I can use it?
>>> 
>>> 
>> http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html
>>> 
>>> Thanks for answers.
>>> 
>>> cheers,
>>> 
>>>       Timo
>>> 
>>> Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended to be
>> read only by the individual or entity to whom this message is addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any form,
>> is strictly prohibited.  If you have received this message in error, please
>> immediately notify the sender and/or Notifications@carrieriq.com and
>> delete or destroy any copy of this message and its attachments.
>>> 
>> 
>> 


Re: Consistent Backup strategy

Posted by Matteo Bertozzi <th...@gmail.com>.
can you define what "consistent" means to you?
for example online snapshots are row-consistent, but the snapshot of
"Region 1" may be taken at time T0 and the snapshot of "Region N" a time T0
+ X seconds

Matteo



On Mon, Dec 23, 2013 at 7:07 PM, Timo Schaepe <ti...@timoschaepe.de> wrote:

> Sorry, I forgot to mention. Taking the cluster offline is not an option.
> We need an consistent backup of an online cluster. Our plan B is to build a
> second cluster for replication and take offline snapshots from this cluster.
>
> bye,
>
>         Timo
>
>
> Am 23.12.2013 um 11:02 schrieb Vladimir Rodionov <vrodionov@carrieriq.com
> >:
>
> > Offline snapshots?
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Timo Schaepe [timo@timoschaepe.de]
> > Sent: Monday, December 23, 2013 10:53 AM
> > To: user@hbase.apache.org
> > Subject: Consistent Backup strategy
> >
> > Hey guys,
> >
> > we are searching for a consistent backup strategy with the export tool.
> Is this article still up-to-date and I can use it?
> >
> >
> http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html
> >
> > Thanks for answers.
> >
> > cheers,
> >
> >        Timo
> >
> > Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
> >
>
>

Re: Consistent Backup strategy

Posted by Timo Schaepe <ti...@timoschaepe.de>.
Sorry, I forgot to mention. Taking the cluster offline is not an option. We need an consistent backup of an online cluster. Our plan B is to build a second cluster for replication and take offline snapshots from this cluster.

bye,

	Timo


Am 23.12.2013 um 11:02 schrieb Vladimir Rodionov <vr...@carrieriq.com>:

> Offline snapshots?
> 
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
> 
> ________________________________________
> From: Timo Schaepe [timo@timoschaepe.de]
> Sent: Monday, December 23, 2013 10:53 AM
> To: user@hbase.apache.org
> Subject: Consistent Backup strategy
> 
> Hey guys,
> 
> we are searching for a consistent backup strategy with the export tool. Is this article still up-to-date and I can use it?
> 
> http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html
> 
> Thanks for answers.
> 
> cheers,
> 
>        Timo
> 
> Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.
> 


RE: Consistent Backup strategy

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
Offline snapshots?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Timo Schaepe [timo@timoschaepe.de]
Sent: Monday, December 23, 2013 10:53 AM
To: user@hbase.apache.org
Subject: Consistent Backup strategy

Hey guys,

we are searching for a consistent backup strategy with the export tool. Is this article still up-to-date and I can use it?

http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html

Thanks for answers.

cheers,

        Timo

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.