You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Marc Harris <mh...@jumptap.com> on 2008/02/25 19:00:53 UTC

Backup / restore

There has been discussion before about backup / restore but the
discussion has tended to fizzle out. I would like to see backup /
restore functionality for Hbase for the following two purposes:

1) Protection against software bugs deleting data. This is not just the
proverbial namenode gone haywire, but user code running in a map-reduce
task that deletes the wrong thing could be just as disastrous.
2) Ability to copy one Hbase instance's data to another instance. It's
pretty common in sql-land to run a backup tool that produces a large
file (either a compact export file, or just a sequence of sql
statements). This can then be imported to another instance of the db.

The particular use case I have is that of a production Hbase instance
and a development or QA instance. It would be useful to be able to dump
the production instance periodically, and then load it into a
development instance so that new code could be run against it.

I think this would be Hbase specific, not a general Hadoop dump /
restore, because only the logical data should be transferred, not the
precise structure of how tables are split into regions. Does such as
thing exist?

Re: Backup / restore

Posted by Bryan Duxbury <br...@rapleaf.com>.
Yes, the actual regionservers are not a part of the schema. The  
assignments are stored in the meta table, but they'll be cleaned up  
when the data is reloaded on another cluster.

-Bryan

On Feb 25, 2008, at 11:24 AM, Marc Harris wrote:

> I think this does answer my question, yes.
>
> So does this mean that the contents of a particular Hbase instance is
> independent of the configured region servers? And that the way an
> instance is split up into regions is independent of the available  
> region
> servers? If so, then, yes, it seems there is nothing specific to be  
> done
> for Hbase for off-line backup.
>
> Now I have to figure out how to (recursively) copy a directory from  
> one
> HDFS instance to another.
>
> Thanks.
> - Marc
>
> On Mon, 2008-02-25 at 10:09 -0800, Bryan Duxbury wrote:
>
>> If an offline backup/restore is acceptable, then we already have it.
>> All you have to do is copy your hbase rootdir to a new location in
>> hdfs, and you've made a backup. You can also use this technique to
>> copy one instance to another - just boot up a master pointed at the
>> new directory and voila.
>>
>> As far as dumping to a single file or a group of sql statements, that
>> seems like it would be a suboptimal way to manage the amount of data
>> you could potentially be working with. At the very least you want
>> many files. It also makes sense to keep them in their region
>> divisions, otherwise it will be an inordinate amount of work to
>> restore into HBase at a later date.
>>
>> Does this answer your question?
>>
>> -Bryan
>>
>> On Feb 25, 2008, at 10:00 AM, Marc Harris wrote:
>>
>>> There has been discussion before about backup / restore but the
>>> discussion has tended to fizzle out. I would like to see backup /
>>> restore functionality for Hbase for the following two purposes:
>>>
>>> 1) Protection against software bugs deleting data. This is not just
>>> the
>>> proverbial namenode gone haywire, but user code running in a map-
>>> reduce
>>> task that deletes the wrong thing could be just as disastrous.
>>> 2) Ability to copy one Hbase instance's data to another instance.  
>>> It's
>>> pretty common in sql-land to run a backup tool that produces a large
>>> file (either a compact export file, or just a sequence of sql
>>> statements). This can then be imported to another instance of the  
>>> db.
>>>
>>> The particular use case I have is that of a production Hbase  
>>> instance
>>> and a development or QA instance. It would be useful to be able to
>>> dump
>>> the production instance periodically, and then load it into a
>>> development instance so that new code could be run against it.
>>>
>>> I think this would be Hbase specific, not a general Hadoop dump /
>>> restore, because only the logical data should be transferred, not  
>>> the
>>> precise structure of how tables are split into regions. Does such as
>>> thing exist?
>>


Re: Backup / restore

Posted by Marc Harris <mh...@jumptap.com>.
I think this does answer my question, yes.

So does this mean that the contents of a particular Hbase instance is
independent of the configured region servers? And that the way an
instance is split up into regions is independent of the available region
servers? If so, then, yes, it seems there is nothing specific to be done
for Hbase for off-line backup.

Now I have to figure out how to (recursively) copy a directory from one
HDFS instance to another.

Thanks.
- Marc 

On Mon, 2008-02-25 at 10:09 -0800, Bryan Duxbury wrote:

> If an offline backup/restore is acceptable, then we already have it.  
> All you have to do is copy your hbase rootdir to a new location in  
> hdfs, and you've made a backup. You can also use this technique to  
> copy one instance to another - just boot up a master pointed at the  
> new directory and voila.
> 
> As far as dumping to a single file or a group of sql statements, that  
> seems like it would be a suboptimal way to manage the amount of data  
> you could potentially be working with. At the very least you want  
> many files. It also makes sense to keep them in their region  
> divisions, otherwise it will be an inordinate amount of work to  
> restore into HBase at a later date.
> 
> Does this answer your question?
> 
> -Bryan
> 
> On Feb 25, 2008, at 10:00 AM, Marc Harris wrote:
> 
> > There has been discussion before about backup / restore but the
> > discussion has tended to fizzle out. I would like to see backup /
> > restore functionality for Hbase for the following two purposes:
> >
> > 1) Protection against software bugs deleting data. This is not just  
> > the
> > proverbial namenode gone haywire, but user code running in a map- 
> > reduce
> > task that deletes the wrong thing could be just as disastrous.
> > 2) Ability to copy one Hbase instance's data to another instance. It's
> > pretty common in sql-land to run a backup tool that produces a large
> > file (either a compact export file, or just a sequence of sql
> > statements). This can then be imported to another instance of the db.
> >
> > The particular use case I have is that of a production Hbase instance
> > and a development or QA instance. It would be useful to be able to  
> > dump
> > the production instance periodically, and then load it into a
> > development instance so that new code could be run against it.
> >
> > I think this would be Hbase specific, not a general Hadoop dump /
> > restore, because only the logical data should be transferred, not the
> > precise structure of how tables are split into regions. Does such as
> > thing exist?
> 

Re: Backup / restore

Posted by stack <st...@duboce.net>.
Bryan Duxbury wrote:
> If an offline backup/restore is acceptable, then we already have it. 
> All you have to do is copy your hbase rootdir to a new location in 
> hdfs, and you've made a backup. You can also use this technique to 
> copy one instance to another - just boot up a master pointed at the 
> new directory and voila.

You can even run a MR job to do the copy quickly: see ./bin/hadoop 
distcp (This tool is also able to go between filesystems, IIRC, so you 
could copy from hdfs to s3, etc.).

We should though do the work to make it so you can copy an hbase 
instance while online.  The plan for hbase 0.2, 
http://wiki.apache.org/hadoop/Hbase/Plan-0.2, calls out the need of such 
a facility. Would be sweet if you could force hbase to dump its 
in-memory stores of edits and then run a copy of all online files behind 
a particular timestamp while hbase was up and running (would need to do 
something to stop compactions removing old files while the copy ran or 
do something like bdbje where unused files earn a '.del' suffix).

St.Ack

Re: Backup / restore

Posted by Bryan Duxbury <br...@rapleaf.com>.
If an offline backup/restore is acceptable, then we already have it.  
All you have to do is copy your hbase rootdir to a new location in  
hdfs, and you've made a backup. You can also use this technique to  
copy one instance to another - just boot up a master pointed at the  
new directory and voila.

As far as dumping to a single file or a group of sql statements, that  
seems like it would be a suboptimal way to manage the amount of data  
you could potentially be working with. At the very least you want  
many files. It also makes sense to keep them in their region  
divisions, otherwise it will be an inordinate amount of work to  
restore into HBase at a later date.

Does this answer your question?

-Bryan

On Feb 25, 2008, at 10:00 AM, Marc Harris wrote:

> There has been discussion before about backup / restore but the
> discussion has tended to fizzle out. I would like to see backup /
> restore functionality for Hbase for the following two purposes:
>
> 1) Protection against software bugs deleting data. This is not just  
> the
> proverbial namenode gone haywire, but user code running in a map- 
> reduce
> task that deletes the wrong thing could be just as disastrous.
> 2) Ability to copy one Hbase instance's data to another instance. It's
> pretty common in sql-land to run a backup tool that produces a large
> file (either a compact export file, or just a sequence of sql
> statements). This can then be imported to another instance of the db.
>
> The particular use case I have is that of a production Hbase instance
> and a development or QA instance. It would be useful to be able to  
> dump
> the production instance periodically, and then load it into a
> development instance so that new code could be run against it.
>
> I think this would be Hbase specific, not a general Hadoop dump /
> restore, because only the logical data should be transferred, not the
> precise structure of how tables are split into regions. Does such as
> thing exist?