You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Alexey Kovyrin <al...@kovyrin.net> on 2010/09/07 18:27:19 UTC

Hbase Backups

Hi guys,

More and more data in our company is moving from mysql tables to hbase
and more and more worried I am about the "no backups" situation with
that data. I've started looking for possible solutions to backup the
data and found two major options:
1) distcp of /hbase directory somewhere
2) HBASE-1684

So, I have a few questions for hbase "users":
1) How do you backup your small (up to a hundred gb) tables?
2) How do you backup your huge (terabytes in size) tables?

And a question for hbase developers: what kind of problems could cause
a distcp from a non-locked hbase table (there is no way to lock table
writes while backing it up AFAIU)? I understand I could lose writes
made after I begin the backup, but if my distcp takes an hour to
complete, I imagine lots of things will happen on the filesystem
during this period of time. Will hbase be able to recover from this
kind of mess?

Thanks a lot for your comments.

-- 
Alexey Kovyrin
http://kovyrin.net/

RE: Hbase Backups

Posted by Michael Segel <mi...@hotmail.com>.
Backups?

LOL... Funny you should ask...

Depending on your existing infrastructure, and the size of your cloud and the amount of data you want to store...
YMMV.

What we determined at client site is that the standard of backing system up to SAN from SAN to tape and tape goes offsite doesn't work.

We have too much data so you have to consider backing up tables out of HBase to HDFS for a local 'hot' copy. Then move the backup to secondary cloud for off site backup.

Until 0.90  and 0.21, I don't think you can take advantage of snapshots just yet.

So your best bet is to take advantage of the built in export import feature and write a shell script around it.
So you can either do a full or incremental backup strategy. 

The drawback is that its a m/r job so it will take resources away from your cloud and you need to look at your SLA.

HTH


> From: alexey@kovyrin.net
> Date: Tue, 7 Sep 2010 12:27:19 -0400
> Subject: Hbase Backups
> To: user@hbase.apache.org
> 
> Hi guys,
> 
> More and more data in our company is moving from mysql tables to hbase
> and more and more worried I am about the "no backups" situation with
> that data. I've started looking for possible solutions to backup the
> data and found two major options:
> 1) distcp of /hbase directory somewhere
> 2) HBASE-1684
> 
> So, I have a few questions for hbase "users":
> 1) How do you backup your small (up to a hundred gb) tables?
> 2) How do you backup your huge (terabytes in size) tables?
> 
> And a question for hbase developers: what kind of problems could cause
> a distcp from a non-locked hbase table (there is no way to lock table
> writes while backing it up AFAIU)? I understand I could lose writes
> made after I begin the backup, but if my distcp takes an hour to
> complete, I imagine lots of things will happen on the filesystem
> during this period of time. Will hbase be able to recover from this
> kind of mess?
> 
> Thanks a lot for your comments.
> 
> -- 
> Alexey Kovyrin
> http://kovyrin.net/
 		 	   		  

Re: Hbase Backups

Posted by Jean-Daniel Cryans <jd...@apache.org>.
If you are asking about "current" solutions, then yes you can distcp
but I would consider that a last resort solution for the reasons you
described (yes, you could end up with an inconsistent state that
requires manual fixing). Also it completely bypasses row locks.

Another choice is using the Export MR job, using the start time option
to do incremental backups. But then you have to distcp the result of
that MR. And it's not a "point in time" that you are snapshotting,
since it doesn't lock all rows (and you don't really want that hehe).

Since you are on 0.89, you can use cluster replication. This will keep
an almost up-to-date replica on another cluster. Cons are that it
requires another cluster (may be a good thing to have in any case),
and it's still experimental so you could run into issues. See
http://hbase.apache.org/docs/r0.89.20100726/replication.html

In the future there's HBASE-50 that should also be useful.

J-D

On Tue, Sep 7, 2010 at 9:27 AM, Alexey Kovyrin <al...@kovyrin.net> wrote:
> Hi guys,
>
> More and more data in our company is moving from mysql tables to hbase
> and more and more worried I am about the "no backups" situation with
> that data. I've started looking for possible solutions to backup the
> data and found two major options:
> 1) distcp of /hbase directory somewhere
> 2) HBASE-1684
>
> So, I have a few questions for hbase "users":
> 1) How do you backup your small (up to a hundred gb) tables?
> 2) How do you backup your huge (terabytes in size) tables?
>
> And a question for hbase developers: what kind of problems could cause
> a distcp from a non-locked hbase table (there is no way to lock table
> writes while backing it up AFAIU)? I understand I could lose writes
> made after I begin the backup, but if my distcp takes an hour to
> complete, I imagine lots of things will happen on the filesystem
> during this period of time. Will hbase be able to recover from this
> kind of mess?
>
> Thanks a lot for your comments.
>
> --
> Alexey Kovyrin
> http://kovyrin.net/
>