You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Stack <st...@duboce.net> on 2012/05/15 15:11:09 UTC

Server Crash Recovery (WAS --> Re: ANN: The third hbase 0.94.0 release candidate is available for download)

(Moved this conversation off the vote thread)

On Sat, May 12, 2012 at 3:14 PM, Mikael Sitruk <mi...@gmail.com> wrote:
> So in case a RS goes down, the master will split the log and reassign the
> regions to other RS, then each RS will replay the log, during this step the
> regions are unavailable, and clients will got exceptions.

To be clear, the log splitting will result in each region having under
it, its own edits only for replay.

> 1. how the master will choose a RS to assign a region?

Random currently.  It picks from the list of currently live RSs.

> 2. how many RS will be involved in this reassignment

All remaining live regionservers.

> 3. client that got exception should renew their connections or they can
> reuse the same one?

When client gets the NotServingRegionException, it goes back to the
.META. to find location of the region.  It'll then retry this
location.   The location maybe the still-down server.  Client will
keep at this until it either timesout or it the address is updated in
.META. with the new location.

> 4. is there a way to figure out how long this split+replay will take
> (either by formula at the design time of a deployment, or at runtime via
> API asking the master for example)???
>

Usually its a factor of how many WAL files the regionserver was
carrying when it went down (You'll see in the logs where we sometimes
force flushes to clear the memstores carrying oldest edits just so we
can clear out old WAL files.  The log roller figures out what needs
flushing.  See http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/LogRoller.html#95).
 You can set the max number of WALs a regionserver carries; grep
'hbase.regionserver.maxlogs' (We don't seem to doc this one -- we
should fix that).

St.Ack

Re: Server Crash Recovery (WAS --> Re: ANN: The third hbase 0.94.0 release candidate is available for download)

Posted by Stack <st...@duboce.net>.

On Wed, May 16, 2012 at 12:25 AM, Mikael Sitruk <mi...@gmail.com> wrote:
> Hey St.Ack
>
> Thanks for clarifications,
>
> For 4. replay of log: (Please correct if i'm wrong)
> So the RS will:
> a. split the log via HLogSplitter, write concurrently log content to other
> log files under each region,

Yes.

The next bit is not right.  All of a servers logs have to finish
splitting before any of its regions will be assigned.   So, interject
into your narrative....

a'. When all regionservers have finished splitting the crashed
servers's logs, the master will assign out the regions.

Then, each regionserver that receives one of these regions, it will
notice that on open, before onlining, that the region has edits to
replay from a log split.  It will then....

> b. replay those smaller logs into its own memstore and own logs (is it done
> when the region becomes on-line?)

We'll replay the edits into the memstore.  We'll then force a flush on
the region to create a new hfile of the recovered edits. Only then do
we clean away the edits file so that on next open, the replay does not
happen again (If we crash before the hfile is successfully flushed,
the edits will be in place for the next open elsewhere).

> During this replay the RS may be subject to log flushing, and to compaction
> (flush will create more store file that will reach the min/max compaction),
> and other regular background task.
>

Thats right.  The distributed splitting is background task that should
not adversely effect the foreground RS tasks.

> During this time (log replay) the RS also accepts other client requests (on
> its own regions, not the ones it got assigned) or they are blocked?
> In case the RS is handling other requests too, is there any priority for
> WAL replay?
>

RS should be working 'as usual' while the splitting is going on.

No priority as yet for WAL splitting.

St.Ack

Re: Server Crash Recovery (WAS --> Re: ANN: The third hbase 0.94.0 release candidate is available for download)

Posted by Mikael Sitruk <mi...@gmail.com>.

Hey St.Ack

Thanks for clarifications,

For 4. replay of log: (Please correct if i'm wrong)
So the RS will:
a. split the log via HLogSplitter, write concurrently log content to other
log files under each region,
b. replay those smaller logs into its own memstore and own logs (is it done
when the region becomes on-line?)
During this replay the RS may be subject to log flushing, and to compaction
(flush will create more store file that will reach the min/max compaction),
and other regular background task.

During this time (log replay) the RS also accepts other client requests (on
its own regions, not the ones it got assigned) or they are blocked?
In case the RS is handling other requests too, is there any priority for
WAL replay?

Thanks
Mikael.S


On Tue, May 15, 2012 at 4:11 PM, Stack <st...@duboce.net> wrote:

> (Moved this conversation off the vote thread)
>
> On Sat, May 12, 2012 at 3:14 PM, Mikael Sitruk <mi...@gmail.com>
> wrote:
> > So in case a RS goes down, the master will split the log and reassign the
> > regions to other RS, then each RS will replay the log, during this step
> the
> > regions are unavailable, and clients will got exceptions.
>
> To be clear, the log splitting will result in each region having under
> it, its own edits only for replay.
>
> > 1. how the master will choose a RS to assign a region?
>
> Random currently.  It picks from the list of currently live RSs.
>
> > 2. how many RS will be involved in this reassignment
>
> All remaining live regionservers.
>
> > 3. client that got exception should renew their connections or they can
> > reuse the same one?
>
> When client gets the NotServingRegionException, it goes back to the
> .META. to find location of the region.  It'll then retry this
> location.   The location maybe the still-down server.  Client will
> keep at this until it either timesout or it the address is updated in
> .META. with the new location.
>
> > 4. is there a way to figure out how long this split+replay will take
> > (either by formula at the design time of a deployment, or at runtime via
> > API asking the master for example)???
> >
>
> Usually its a factor of how many WAL files the regionserver was
> carrying when it went down (You'll see in the logs where we sometimes
> force flushes to clear the memstores carrying oldest edits just so we
> can clear out old WAL files.  The log roller figures out what needs
> flushing.  See
> http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/LogRoller.html#95
> ).
>  You can set the max number of WALs a regionserver carries; grep
> 'hbase.regionserver.maxlogs' (We don't seem to doc this one -- we
> should fix that).
>
> St.Ack
>