You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by lars hofhansl <lh...@yahoo.com> on 2011/09/16 01:11:01 UTC

Hooks for WAL archiving

I have been thinking about backup and point in time recovery (pitr) in HBase. This is mainly needed in case of software errors,
or when a customer would ask us to restore some data they accidentally deleted.

A possible answer to keep all versions with no TTL, and do replication. At a certain size this ceases to be practical though.


A typical scenario for relational databases is to take periodic base backups and also archive the log files.
Would that even work in HBase currently? Say I have distcp copy of all HBase files that was done while HBase was running and I
also have an archive of all WALs since the time when the distcp started.

Could I theoretically restore HBase to a consistent state (at any time after the distcp finished)? Or are there changes that are not
WAL logged that I would miss (like admin actions)?


If that works, a backup would involve these steps:
1. Flush all stores.
2. copy the files.
3. roll all logs.


#1 and #3 are really optional, #3 is good because it would make all logs eligible for archiving right after the backup is done.


In any case some hooks to act upon HLog actions would be a good thing anyway. For example we could add four new methods to WALObserver (or a new observer type):

boolean preLogRoll(Path newFile)
void postLogRoll(Path newFile)

boolean preLogArchive(Path oldFile)
void postLogArchive(Path oldFile)

Returning true from the pre versions would bypass the default actions (although in this case I am not sure how useful that would be).

That way it would possible to act upon HLog file activity and (for example) archive these files somewhere.

Of course one could also just watch the directories in HDFS periodically, but that seems awkward.

Comments?


-- Lars

Re: Hooks for WAL archiving

Posted by lars hofhansl <lh...@yahoo.com>.
Yes, that's the link. None of the options in there in entirely satisfactory...


I think StumbleUpon is using replication extensively, I'll let them provide more details if they like.

At Salesforce we will also use Replication for HA and DR, and have been working on Replication Features (HBASE-2195, 2196, 3130) and also HBASE-4071 (which allows a backups with TTL set, while at least keeping some versions around), and partially HBASE-4363.

One sore spot is HBASE-2611, but I am sure that can be fixed, too.

For HA and DR this should be a far better option than backup/restore.

That all said, the reason why I posted the initial message, was exactly that we need safety that cannot be guaranteed by replication alone.
Software errors, operator errors, data corruption, etc, would just be replicated to the remote site.

Long story for saying: I think replication is production ready and can be used for HA and DR, but backup/restore is still needed.

-- Lars


----- Original Message -----
From: Steinmaurer Thomas <Th...@scch.at>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Cc: 
Sent: Thursday, September 15, 2011 11:24 PM
Subject: RE: Hooks for WAL archiving

Hello Lars,

if you talk about HBase replication, it's a rather new feature and during our evaluation, we aren't sure if it's ready for production.

While I agree that regular backups in the PB area isn't an option, IMHO it is in the low TB area. The system operating team is demanding a reliable backup solution if it is affordable disk space wise.

I think you meant the backup options discussed here:
http://blog.sematext.com/2011/03/11/hbase-backup-options/

I can imagine, that in the distributed data management area, a consistent (incremental) snapshot backup, while the system is in use, isn't that easy to implement.


Thomas


-----Original Message-----
From: lars hofhansl [mailto:lhofhansl@yahoo.com] 
Sent: Freitag, 16. September 2011 08:13
To: user@hbase.apache.org
Subject: Re: Hooks for WAL archiving

Hallo Thomas,

I guess the general sentiment is/was that a store that scales to petabytes by adding more and more machines does not lend itself to conventional backup.

I think you can get fairly far with replication. That gives you HA and DR (disaster recovery), but does not guard against software or operator error.
It's also asynchronous, so there's a window of loosing data.


There was a link posted here recently to a blog listing the various HBase backup strategies. Can't find it just now, but if you look through this list that past 10 days or so, you'll find it.

-- Lars


----- Original Message -----
From: Steinmaurer Thomas <Th...@scch.at>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Cc: 
Sent: Thursday, September 15, 2011 10:38 PM
Subject: RE: Hooks for WAL archiving

Hello,

as our major point in giving HBase a go for production is (incremental) online/snapshot backups, I find a public discussion rather interesting. ;-)

I really wonder how other use HBase in production with a backup/restore scenario in place? Or are they all that big data wise, that regular backups like in the RDBMS world aren't really possible.

Thanks,
Thomas


-----Original Message-----
From: lars hofhansl [mailto:lhofhansl@yahoo.com]
Sent: Freitag, 16. September 2011 07:30
To: user@hbase.apache.org
Subject: Re: Hooks for WAL archiving

Ah yes, HBASE-4132 is essentially covering the same idea I had for the WALObserver (and I meant the ...coprocessor.WALObserver).

HBASE-50 seems only tangentially related to what I had in mind, it allows snapshots, but does not help with pitr (I think).


So you'd create a manifest to avoid copying newer files? That would lock the earliest recovery time to when we start the copy (not when we end it), which is nice. What about compactions? They'd remove some of the old files and the data is now in new files.


Copying the WALs should really be a separate process, I think. The base backup would maybe not even copy them.
The scenario I have in mind is where one would take a base backup whenever it makes sense (once a day, a week, a month) and always archive all WALs. The last base backup with WALs could be on disks and previous ones might be spooled to tape.

Maybe I am just dreaming, but then one could restore a base backup, and replay the necessary WALs to bring the state to *any* given point in time.


Then one of the harder issues - as you say - would be to associate the correct WALs with the snapshot.
I wonder how precise we really need to be, though. As every edit is timestamped - in theory - WAL replay would be idempotent.


And what about new tables? .META. changes are probably logged, but replaying those we'd somehow need to create the tables.
And then the .META. changes have to be in strict order w.r.t. to the other WAL replays.

Are the sequenceIds globally ordered? I wonder if they even need to be.


Maybe have an offline discussion?


-- Lars


________________________________
From: Stack <st...@duboce.net>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Thursday, September 15, 2011 9:29 PM
Subject: Re: Hooks for WAL archiving

On Thu, Sep 15, 2011 at 4:11 PM, lars hofhansl <lh...@yahoo.com> wrote:
> A typical scenario for relational databases is to take periodic base backups and also archive the log files.
> Would that even work in HBase currently? Say I have distcp copy of all 
> HBase files that was done while HBase was running and I also have an archive of all WALs since the time when the distcp started.
>
> Could I theoretically restore HBase to a consistent state (at any time 
> after the distcp finished)? Or are there changes that are not WAL logged that I would miss (like admin actions)?
>

I'm interested in this topic too.

Related work was done up in hbase-50, snapshotting.  Have you seen that lars?  It'd roll WALs and make a manifest of all hfiles.  A background could then copy off the hfiles in the manifests and WALs.
IIRC, there was a restore from a snapshot mechanism too.

Need to figure stuff like most recent sequenceid for a region and then discard all WAL edits that were done before this sequenceid.   Reading head and tail of WAL we could figure what sequenceids it had (we should probably get the sequenceid out in the name of the file, at least the start sequenceid.... and perhaps even an accompanying metadata file or entry on the end of the WAL that had the list of regions for which the WAL had edits (maybe this is more trouble than its worth since there will be times when we don't close WAL properly).

Sequenceids are kept by the regionserver.  hfiles are by region which can move among regionservers.


> If that works, a backup would involve these steps:
> 1. Flush all stores.

Flush would be nice but could take a good while to complete... could jepardize your snapshot.

> 2. copy the files.

I'd dump a manifest and background copy.  Copy is going to be heavy-duty too I'd say if you let it run full belt.

> 3. roll all logs.
>
>
> #1 and #3 are really optional, #3 is good because it would make all logs eligible for archiving right after the backup is done.
>
>
> In any case some hooks to act upon HLog actions would be a good thing anyway. For example we could add four new methods to WALObserver (or a new observer type):
>
> boolean preLogRoll(Path newFile)
> void postLogRoll(Path newFile)
>
> boolean preLogArchive(Path oldFile)
> void postLogArchive(Path oldFile)
>

Is HBASE-4132 related?

St.Ack

RE: Hooks for WAL archiving

Posted by Steinmaurer Thomas <Th...@scch.at>.
Hello Lars,

if you talk about HBase replication, it's a rather new feature and during our evaluation, we aren't sure if it's ready for production.

While I agree that regular backups in the PB area isn't an option, IMHO it is in the low TB area. The system operating team is demanding a reliable backup solution if it is affordable disk space wise.

I think you meant the backup options discussed here:
http://blog.sematext.com/2011/03/11/hbase-backup-options/

I can imagine, that in the distributed data management area, a consistent (incremental) snapshot backup, while the system is in use, isn't that easy to implement.


Thomas


-----Original Message-----
From: lars hofhansl [mailto:lhofhansl@yahoo.com] 
Sent: Freitag, 16. September 2011 08:13
To: user@hbase.apache.org
Subject: Re: Hooks for WAL archiving

Hallo Thomas,

I guess the general sentiment is/was that a store that scales to petabytes by adding more and more machines does not lend itself to conventional backup.

I think you can get fairly far with replication. That gives you HA and DR (disaster recovery), but does not guard against software or operator error.
It's also asynchronous, so there's a window of loosing data.


There was a link posted here recently to a blog listing the various HBase backup strategies. Can't find it just now, but if you look through this list that past 10 days or so, you'll find it.

-- Lars


----- Original Message -----
From: Steinmaurer Thomas <Th...@scch.at>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Cc: 
Sent: Thursday, September 15, 2011 10:38 PM
Subject: RE: Hooks for WAL archiving

Hello,

as our major point in giving HBase a go for production is (incremental) online/snapshot backups, I find a public discussion rather interesting. ;-)

I really wonder how other use HBase in production with a backup/restore scenario in place? Or are they all that big data wise, that regular backups like in the RDBMS world aren't really possible.

Thanks,
Thomas


-----Original Message-----
From: lars hofhansl [mailto:lhofhansl@yahoo.com]
Sent: Freitag, 16. September 2011 07:30
To: user@hbase.apache.org
Subject: Re: Hooks for WAL archiving

Ah yes, HBASE-4132 is essentially covering the same idea I had for the WALObserver (and I meant the ...coprocessor.WALObserver).

HBASE-50 seems only tangentially related to what I had in mind, it allows snapshots, but does not help with pitr (I think).


So you'd create a manifest to avoid copying newer files? That would lock the earliest recovery time to when we start the copy (not when we end it), which is nice. What about compactions? They'd remove some of the old files and the data is now in new files.


Copying the WALs should really be a separate process, I think. The base backup would maybe not even copy them.
The scenario I have in mind is where one would take a base backup whenever it makes sense (once a day, a week, a month) and always archive all WALs. The last base backup with WALs could be on disks and previous ones might be spooled to tape.

Maybe I am just dreaming, but then one could restore a base backup, and replay the necessary WALs to bring the state to *any* given point in time.


Then one of the harder issues - as you say - would be to associate the correct WALs with the snapshot.
I wonder how precise we really need to be, though. As every edit is timestamped - in theory - WAL replay would be idempotent.


And what about new tables? .META. changes are probably logged, but replaying those we'd somehow need to create the tables.
And then the .META. changes have to be in strict order w.r.t. to the other WAL replays.

Are the sequenceIds globally ordered? I wonder if they even need to be.


Maybe have an offline discussion?


-- Lars


________________________________
From: Stack <st...@duboce.net>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Thursday, September 15, 2011 9:29 PM
Subject: Re: Hooks for WAL archiving

On Thu, Sep 15, 2011 at 4:11 PM, lars hofhansl <lh...@yahoo.com> wrote:
> A typical scenario for relational databases is to take periodic base backups and also archive the log files.
> Would that even work in HBase currently? Say I have distcp copy of all 
> HBase files that was done while HBase was running and I also have an archive of all WALs since the time when the distcp started.
>
> Could I theoretically restore HBase to a consistent state (at any time 
> after the distcp finished)? Or are there changes that are not WAL logged that I would miss (like admin actions)?
>

I'm interested in this topic too.

Related work was done up in hbase-50, snapshotting.  Have you seen that lars?  It'd roll WALs and make a manifest of all hfiles.  A background could then copy off the hfiles in the manifests and WALs.
IIRC, there was a restore from a snapshot mechanism too.

Need to figure stuff like most recent sequenceid for a region and then discard all WAL edits that were done before this sequenceid.   Reading head and tail of WAL we could figure what sequenceids it had (we should probably get the sequenceid out in the name of the file, at least the start sequenceid.... and perhaps even an accompanying metadata file or entry on the end of the WAL that had the list of regions for which the WAL had edits (maybe this is more trouble than its worth since there will be times when we don't close WAL properly).

Sequenceids are kept by the regionserver.  hfiles are by region which can move among regionservers.


> If that works, a backup would involve these steps:
> 1. Flush all stores.

Flush would be nice but could take a good while to complete... could jepardize your snapshot.

> 2. copy the files.

I'd dump a manifest and background copy.  Copy is going to be heavy-duty too I'd say if you let it run full belt.

> 3. roll all logs.
>
>
> #1 and #3 are really optional, #3 is good because it would make all logs eligible for archiving right after the backup is done.
>
>
> In any case some hooks to act upon HLog actions would be a good thing anyway. For example we could add four new methods to WALObserver (or a new observer type):
>
> boolean preLogRoll(Path newFile)
> void postLogRoll(Path newFile)
>
> boolean preLogArchive(Path oldFile)
> void postLogArchive(Path oldFile)
>

Is HBASE-4132 related?

St.Ack


Re: Hooks for WAL archiving

Posted by lars hofhansl <lh...@yahoo.com>.
Hallo Thomas,

I guess the general sentiment is/was that a store that scales to petabytes by adding more and more machines does not lend itself to conventional backup.

I think you can get fairly far with replication. That gives you HA and DR (disaster recovery), but does not guard against software or operator error.
It's also asynchronous, so there's a window of loosing data.


There was a link posted here recently to a blog listing the various HBase backup strategies. Can't find it just now, but if you look through this list that past 10 days or so, you'll find it.

-- Lars


----- Original Message -----
From: Steinmaurer Thomas <Th...@scch.at>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Cc: 
Sent: Thursday, September 15, 2011 10:38 PM
Subject: RE: Hooks for WAL archiving

Hello,

as our major point in giving HBase a go for production is (incremental) online/snapshot backups, I find a public discussion rather interesting. ;-)

I really wonder how other use HBase in production with a backup/restore scenario in place? Or are they all that big data wise, that regular backups like in the RDBMS world aren't really possible.

Thanks,
Thomas


-----Original Message-----
From: lars hofhansl [mailto:lhofhansl@yahoo.com] 
Sent: Freitag, 16. September 2011 07:30
To: user@hbase.apache.org
Subject: Re: Hooks for WAL archiving

Ah yes, HBASE-4132 is essentially covering the same idea I had for the WALObserver (and I meant the ...coprocessor.WALObserver).

HBASE-50 seems only tangentially related to what I had in mind, it allows snapshots, but does not help with pitr (I think).


So you'd create a manifest to avoid copying newer files? That would lock the earliest recovery time to when we start the copy (not when we end it), which is nice. What about compactions? They'd remove some of the old files and the data is now in new files.


Copying the WALs should really be a separate process, I think. The base backup would maybe not even copy them.
The scenario I have in mind is where one would take a base backup whenever it makes sense (once a day, a week, a month) and always archive all WALs. The last base backup with WALs could be on disks and previous ones might be spooled to tape.

Maybe I am just dreaming, but then one could restore a base backup, and replay the necessary WALs to bring the state to *any* given point in time.


Then one of the harder issues - as you say - would be to associate the correct WALs with the snapshot.
I wonder how precise we really need to be, though. As every edit is timestamped - in theory - WAL replay would be idempotent.


And what about new tables? .META. changes are probably logged, but replaying those we'd somehow need to create the tables.
And then the .META. changes have to be in strict order w.r.t. to the other WAL replays.

Are the sequenceIds globally ordered? I wonder if they even need to be.


Maybe have an offline discussion?


-- Lars


________________________________
From: Stack <st...@duboce.net>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Thursday, September 15, 2011 9:29 PM
Subject: Re: Hooks for WAL archiving

On Thu, Sep 15, 2011 at 4:11 PM, lars hofhansl <lh...@yahoo.com> wrote:
> A typical scenario for relational databases is to take periodic base backups and also archive the log files.
> Would that even work in HBase currently? Say I have distcp copy of all 
> HBase files that was done while HBase was running and I also have an archive of all WALs since the time when the distcp started.
>
> Could I theoretically restore HBase to a consistent state (at any time 
> after the distcp finished)? Or are there changes that are not WAL logged that I would miss (like admin actions)?
>

I'm interested in this topic too.

Related work was done up in hbase-50, snapshotting.  Have you seen that lars?  It'd roll WALs and make a manifest of all hfiles.  A background could then copy off the hfiles in the manifests and WALs.
IIRC, there was a restore from a snapshot mechanism too.

Need to figure stuff like most recent sequenceid for a region and then discard all WAL edits that were done before this sequenceid.   Reading head and tail of WAL we could figure what sequenceids it had (we should probably get the sequenceid out in the name of the file, at least the start sequenceid.... and perhaps even an accompanying metadata file or entry on the end of the WAL that had the list of regions for which the WAL had edits (maybe this is more trouble than its worth since there will be times when we don't close WAL properly).

Sequenceids are kept by the regionserver.  hfiles are by region which can move among regionservers.


> If that works, a backup would involve these steps:
> 1. Flush all stores.

Flush would be nice but could take a good while to complete... could jepardize your snapshot.

> 2. copy the files.

I'd dump a manifest and background copy.  Copy is going to be heavy-duty too I'd say if you let it run full belt.

> 3. roll all logs.
>
>
> #1 and #3 are really optional, #3 is good because it would make all logs eligible for archiving right after the backup is done.
>
>
> In any case some hooks to act upon HLog actions would be a good thing anyway. For example we could add four new methods to WALObserver (or a new observer type):
>
> boolean preLogRoll(Path newFile)
> void postLogRoll(Path newFile)
>
> boolean preLogArchive(Path oldFile)
> void postLogArchive(Path oldFile)
>

Is HBASE-4132 related?

St.Ack


RE: Hooks for WAL archiving

Posted by Steinmaurer Thomas <Th...@scch.at>.
Hello,

as our major point in giving HBase a go for production is (incremental) online/snapshot backups, I find a public discussion rather interesting. ;-)

I really wonder how other use HBase in production with a backup/restore scenario in place? Or are they all that big data wise, that regular backups like in the RDBMS world aren't really possible.

Thanks,
Thomas


-----Original Message-----
From: lars hofhansl [mailto:lhofhansl@yahoo.com] 
Sent: Freitag, 16. September 2011 07:30
To: user@hbase.apache.org
Subject: Re: Hooks for WAL archiving

Ah yes, HBASE-4132 is essentially covering the same idea I had for the WALObserver (and I meant the ...coprocessor.WALObserver).

HBASE-50 seems only tangentially related to what I had in mind, it allows snapshots, but does not help with pitr (I think).


So you'd create a manifest to avoid copying newer files? That would lock the earliest recovery time to when we start the copy (not when we end it), which is nice. What about compactions? They'd remove some of the old files and the data is now in new files.


Copying the WALs should really be a separate process, I think. The base backup would maybe not even copy them.
The scenario I have in mind is where one would take a base backup whenever it makes sense (once a day, a week, a month) and always archive all WALs. The last base backup with WALs could be on disks and previous ones might be spooled to tape.

Maybe I am just dreaming, but then one could restore a base backup, and replay the necessary WALs to bring the state to *any* given point in time.


Then one of the harder issues - as you say - would be to associate the correct WALs with the snapshot.
I wonder how precise we really need to be, though. As every edit is timestamped - in theory - WAL replay would be idempotent.


And what about new tables? .META. changes are probably logged, but replaying those we'd somehow need to create the tables.
And then the .META. changes have to be in strict order w.r.t. to the other WAL replays.

Are the sequenceIds globally ordered? I wonder if they even need to be.


Maybe have an offline discussion?


-- Lars


________________________________
From: Stack <st...@duboce.net>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Thursday, September 15, 2011 9:29 PM
Subject: Re: Hooks for WAL archiving

On Thu, Sep 15, 2011 at 4:11 PM, lars hofhansl <lh...@yahoo.com> wrote:
> A typical scenario for relational databases is to take periodic base backups and also archive the log files.
> Would that even work in HBase currently? Say I have distcp copy of all 
> HBase files that was done while HBase was running and I also have an archive of all WALs since the time when the distcp started.
>
> Could I theoretically restore HBase to a consistent state (at any time 
> after the distcp finished)? Or are there changes that are not WAL logged that I would miss (like admin actions)?
>

I'm interested in this topic too.

Related work was done up in hbase-50, snapshotting.  Have you seen that lars?  It'd roll WALs and make a manifest of all hfiles.  A background could then copy off the hfiles in the manifests and WALs.
IIRC, there was a restore from a snapshot mechanism too.

Need to figure stuff like most recent sequenceid for a region and then discard all WAL edits that were done before this sequenceid.   Reading head and tail of WAL we could figure what sequenceids it had (we should probably get the sequenceid out in the name of the file, at least the start sequenceid.... and perhaps even an accompanying metadata file or entry on the end of the WAL that had the list of regions for which the WAL had edits (maybe this is more trouble than its worth since there will be times when we don't close WAL properly).

Sequenceids are kept by the regionserver.  hfiles are by region which can move among regionservers.


> If that works, a backup would involve these steps:
> 1. Flush all stores.

Flush would be nice but could take a good while to complete... could jepardize your snapshot.

> 2. copy the files.

I'd dump a manifest and background copy.  Copy is going to be heavy-duty too I'd say if you let it run full belt.

> 3. roll all logs.
>
>
> #1 and #3 are really optional, #3 is good because it would make all logs eligible for archiving right after the backup is done.
>
>
> In any case some hooks to act upon HLog actions would be a good thing anyway. For example we could add four new methods to WALObserver (or a new observer type):
>
> boolean preLogRoll(Path newFile)
> void postLogRoll(Path newFile)
>
> boolean preLogArchive(Path oldFile)
> void postLogArchive(Path oldFile)
>

Is HBASE-4132 related?

St.Ack

Re: Hooks for WAL archiving

Posted by lars hofhansl <lh...@yahoo.com>.
> Tape?  Whats that?


We are old fashioned at Salesforce. We will also likely have SLAs for this data :)


> WALs are not compressed.  Archiving could compress them.  They should
> compress well.


I wonder if archived WALs (I mean the ones in .oldlogs) should not always be compressed(?)


> Roughly, yes.  Counters would be a pain if we double-counted.


Are the counters logged with their deltas or resulting value?


> > Are the sequenceIds globally ordered? I wonder if they even need to be.
> >
>
> They are not globally ordered.  They are in order within the WAL.

In that case we cannot decide the ordering of edits as they could be temporally interleaved between the WALs.
But except for .META. edits it might not matter.

Hmm... Lots of issues to work out.


Re: Hooks for WAL archiving

Posted by Stack <st...@duboce.net>.
On Thu, Sep 15, 2011 at 10:29 PM, lars hofhansl <lh...@yahoo.com> wrote:
> So you'd create a manifest to avoid copying newer files? That would lock the earliest recovery time to when we start the copy (not when we end it), which is nice. What about compactions? They'd remove some of the old files and the data is now in new files.
>

Either move them to an archive or rename them with '.del' and let the
copy process clean them up when done?


> Copying the WALs should really be a separate process, I think. The base backup would maybe not even copy them.
> The scenario I have in mind is where one would take a base backup whenever it makes sense (once a day, a week, a month)
> and always archive all WALs. The last base backup with WALs could be on disks and previous ones might be spooled to tape.
>

Tape?  Whats that?

WALs are not compressed.  Archiving could compress them.  They should
compress well.


> Maybe I am just dreaming, but then one could restore a base backup, and replay the necessary WALs to bring the state to *any* given point in time.
>

Yeah. I'd like this.


>
> Then one of the harder issues - as you say - would be to associate the correct WALs with the snapshot.
> I wonder how precise we really need to be, though. As every edit is timestamped - in theory - WAL replay would be idempotent.
>

Roughly, yes.  Counters would be a pain if we double-counted.

>
> And what about new tables? .META. changes are probably logged, but replaying those we'd somehow need to create the tables.
> And then the .META. changes have to be in strict order w.r.t. to the other WAL replays.
>

Hmm.

> Are the sequenceIds globally ordered? I wonder if they even need to be.
>

They are not globally ordered.  They are in order within the WAL.

St.Ack

Re: Hooks for WAL archiving

Posted by lars hofhansl <lh...@yahoo.com>.
Ah yes, HBASE-4132 is essentially covering the same idea I had for the WALObserver (and I meant the ...coprocessor.WALObserver).

HBASE-50 seems only tangentially related to what I had in mind, it allows snapshots, but does not help with pitr (I think).


So you'd create a manifest to avoid copying newer files? That would lock the earliest recovery time to when we start the copy (not when we end it), which is nice. What about compactions? They'd remove some of the old files and the data is now in new files.


Copying the WALs should really be a separate process, I think. The base backup would maybe not even copy them.
The scenario I have in mind is where one would take a base backup whenever it makes sense (once a day, a week, a month)
and always archive all WALs. The last base backup with WALs could be on disks and previous ones might be spooled to tape.

Maybe I am just dreaming, but then one could restore a base backup, and replay the necessary WALs to bring the state to *any* given point in time.


Then one of the harder issues - as you say - would be to associate the correct WALs with the snapshot.
I wonder how precise we really need to be, though. As every edit is timestamped - in theory - WAL replay would be idempotent.


And what about new tables? .META. changes are probably logged, but replaying those we'd somehow need to create the tables.
And then the .META. changes have to be in strict order w.r.t. to the other WAL replays.

Are the sequenceIds globally ordered? I wonder if they even need to be.


Maybe have an offline discussion?


-- Lars


________________________________
From: Stack <st...@duboce.net>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Thursday, September 15, 2011 9:29 PM
Subject: Re: Hooks for WAL archiving

On Thu, Sep 15, 2011 at 4:11 PM, lars hofhansl <lh...@yahoo.com> wrote:
> A typical scenario for relational databases is to take periodic base backups and also archive the log files.
> Would that even work in HBase currently? Say I have distcp copy of all HBase files that was done while HBase was running and I
> also have an archive of all WALs since the time when the distcp started.
>
> Could I theoretically restore HBase to a consistent state (at any time after the distcp finished)? Or are there changes that are not
> WAL logged that I would miss (like admin actions)?
>

I'm interested in this topic too.

Related work was done up in hbase-50, snapshotting.  Have you seen
that lars?  It'd roll WALs and make a manifest of all hfiles.  A
background could then copy off the hfiles in the manifests and WALs.
IIRC, there was a restore from a snapshot mechanism too.

Need to figure stuff like most recent sequenceid for a region and then
discard all WAL edits that were done before this sequenceid.   Reading
head and tail of WAL we could figure what sequenceids it had (we
should probably get the sequenceid out in the name of the file, at
least the start sequenceid.... and perhaps even an accompanying
metadata file or entry on the end of the WAL that had the list of
regions for which the WAL had edits (maybe this is more trouble than
its worth since there will be times when we don't close WAL properly).

Sequenceids are kept by the regionserver.  hfiles are by region which
can move among regionservers.


> If that works, a backup would involve these steps:
> 1. Flush all stores.

Flush would be nice but could take a good while to complete... could
jepardize your snapshot.

> 2. copy the files.

I'd dump a manifest and background copy.  Copy is going to be
heavy-duty too I'd say if you let it run full belt.

> 3. roll all logs.
>
>
> #1 and #3 are really optional, #3 is good because it would make all logs eligible for archiving right after the backup is done.
>
>
> In any case some hooks to act upon HLog actions would be a good thing anyway. For example we could add four new methods to WALObserver (or a new observer type):
>
> boolean preLogRoll(Path newFile)
> void postLogRoll(Path newFile)
>
> boolean preLogArchive(Path oldFile)
> void postLogArchive(Path oldFile)
>

Is HBASE-4132 related?

St.Ack

Re: Hooks for WAL archiving

Posted by Stack <st...@duboce.net>.
On Thu, Sep 15, 2011 at 4:11 PM, lars hofhansl <lh...@yahoo.com> wrote:
> A typical scenario for relational databases is to take periodic base backups and also archive the log files.
> Would that even work in HBase currently? Say I have distcp copy of all HBase files that was done while HBase was running and I
> also have an archive of all WALs since the time when the distcp started.
>
> Could I theoretically restore HBase to a consistent state (at any time after the distcp finished)? Or are there changes that are not
> WAL logged that I would miss (like admin actions)?
>

I'm interested in this topic too.

Related work was done up in hbase-50, snapshotting.  Have you seen
that lars?  It'd roll WALs and make a manifest of all hfiles.  A
background could then copy off the hfiles in the manifests and WALs.
IIRC, there was a restore from a snapshot mechanism too.

Need to figure stuff like most recent sequenceid for a region and then
discard all WAL edits that were done before this sequenceid.   Reading
head and tail of WAL we could figure what sequenceids it had (we
should probably get the sequenceid out in the name of the file, at
least the start sequenceid.... and perhaps even an accompanying
metadata file or entry on the end of the WAL that had the list of
regions for which the WAL had edits (maybe this is more trouble than
its worth since there will be times when we don't close WAL properly).

Sequenceids are kept by the regionserver.  hfiles are by region which
can move among regionservers.


> If that works, a backup would involve these steps:
> 1. Flush all stores.

Flush would be nice but could take a good while to complete... could
jepardize your snapshot.

> 2. copy the files.

I'd dump a manifest and background copy.  Copy is going to be
heavy-duty too I'd say if you let it run full belt.

> 3. roll all logs.
>
>
> #1 and #3 are really optional, #3 is good because it would make all logs eligible for archiving right after the backup is done.
>
>
> In any case some hooks to act upon HLog actions would be a good thing anyway. For example we could add four new methods to WALObserver (or a new observer type):
>
> boolean preLogRoll(Path newFile)
> void postLogRoll(Path newFile)
>
> boolean preLogArchive(Path oldFile)
> void postLogArchive(Path oldFile)
>

Is HBASE-4132 related?

St.Ack

Re: Hooks for WAL archiving

Posted by lars hofhansl <lh...@yahoo.com>.
> > A possible answer to keep all versions with no TTL, and do replication. At a certain size this ceases to be practical though.

> >
> 
> Discussing point-in-time-recovery here at our shop, and trying to
> avoid having to keep all versions is what prompted the below issue:
> 
> HBASE-4071  Data GC: Remove all versions > TTL EXCEPT the last
>                written version (Lars Hofhansl)
> 


That's why I fixed it :) I had been thinking about that more of as a first line of defense. :)

You are probably right and that is way to think about it generally.


> You want to support being able to restore any version?

That's probably what our ops folks want. Since this is fairly new territory for us, that might change, though.
And a backup of the past month on disk in a different datacenter might be good enough.


They might just be extremely delighted if we can give them any data in the past - say - month *instantly*.


> Would be nice if you could filter out complete WALs by looking at
> "metadata", metadata that does not currently exist: e.g. metadata
> could include what regions a WAL has edits for, the range of
> timestamps.


Yep. That would be nice. I also wonder if log entries should be tagged with the current timestamp (i.e. not the one set in Put or Delete, but the actual current time at the time the log was written) in addition to the sequence id. That would provide a global ordering (within the resolution of the timers at least).


> Or, as in hbase-50, could roll logs first before staring the copy.
> That'd narrow the number of WALs to replay for sure.


I see. Yep. HBASE-50 never really got finished, it seems.


> Would need a WAL to hfile mapreduce job.

That's if the table did not exist before, right? I.e. you mean there was no base backup and everything is restored from the WAL?

> I think the PITR would be easier if table-scoped.

Yes, it is actually something we would prefer. We had issues before, where a customer deleted some data (after confirming multiple time that this is really what they want to do :) ), then asked us to restore the data. We had restore the entire database to get those few rows back.


It would certainly simplify the problem. We'd just need a process in place that adds new tables to the replica first and removes them there after our backup time range (or maybe never delete them and just keep them mostly empty).


> Doing it cluster-wide would require our having the meta table in sync
> as you say elsewhere.  Or, we just dump the state of meta when doing a
> cluster backup at the end of PITR and restoring a cluster, the first
> thing we'd do is replace .META. (Could be issue if tables deleted
> between start of PITR and end).


If tables where deleted the WAL replay could just ignore them.


I think you have me convinced now, though, that just using HBASE-4071 is a better route to pursue, and only if we need more than that investigate more low level backup options.
So we need replication to be rock-solid...


> HBASE-4401 Record log region splits and region moves in the HLog


That's cool, watching that one now :)


-- Lars


Re: Hooks for WAL archiving

Posted by Stack <st...@duboce.net>.
On Thu, Sep 15, 2011 at 4:11 PM, lars hofhansl <lh...@yahoo.com> wrote:
> A possible answer to keep all versions with no TTL, and do replication. At a certain size this ceases to be practical though.
>

Discussing point-in-time-recovery here at our shop, and trying to
avoid having to keep all versions is what prompted the below issue:

 HBASE-4071  Data GC: Remove all versions > TTL EXCEPT the last
               written version (Lars Hofhansl)

You want to support being able to restore any version?

Our thought was that the TTL would be the window during which you
could get any version, a month say, and that thereafter, only the last
written would be kept.


> A typical scenario for relational databases is to take periodic base backups and also archive the log files.
> Would that even work in HBase currently? Say I have distcp copy of all HBase files that was done while HBase was running and I
> also have an archive of all WALs since the time when the distcp started.
>

So, you are thinking that you would replay all WALs from the cluster
from the point in time at which the hfile copy started?

That should work.

Would be nice if you could filter out complete WALs by looking at
"metadata", metadata that does not currently exist: e.g. metadata
could include what regions a WAL has edits for, the range of
timestamps.

Or, as in hbase-50, could roll logs first before staring the copy.
That'd narrow the number of WALs to replay for sure.

Would need a WAL to hfile mapreduce job.

I think the PITR would be easier if table-scoped.

Doing it cluster-wide would require our having the meta table in sync
as you say elsewhere.  Or, we just dump the state of meta when doing a
cluster backup at the end of PITR and restoring a cluster, the first
thing we'd do is replace .META. (Could be issue if tables deleted
between start of PITR and end).

> Could I theoretically restore HBase to a consistent state (at any time after the distcp finished)? Or are there changes that are not
> WAL logged that I would miss (like admin actions)?
>

These are not logged currently but Dhruba just opened this:

 HBASE-4401 Record log region splits and region moves in the HLog

St.Ack