You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Friso van Vollenhoven <fv...@xebia.com> on 2011/01/21 11:38:58 UTC

Data loss on clean RS shutdown without WAL?

Hi all,

Question: when a regionserver shuts down cleanly after a YouAreDeadException and the master nicely reassigns the regions, will you loose any data that was written to the memstore of the dead RS when not using WAL?

There was no hard crash and not a single error in any of the logs (except for the FATAL: YouAreDeadException). The RS lost its zookeeper session after a timeout, probably GC combined with some other starvation on heavy load. I think the memstore flushes on shutdown, but I am not entirely sure what happens in the situation where regions are already opened by other regionservers when the dying executes the shutdown code. Can I assume that the RS that gets reassigned a region creates a new HFile and that this will be compacted together with the one left by the dead RS at the next compaction run?


Thanks,
Friso


Re: Data loss on clean RS shutdown without WAL?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
The master splits the logs per region before reassigning them. The log
splits are put directly in the region's folder so that when a region
server opens a region that comes from a dead server, it looks for
those files and processes them first before opening the region (to
ensure consistency).

Splitting logs can be slow when you have tons of them, since only one
machine does it, so work is being done to parallelize it just like in
Bigtable: https://issues.apache.org/jira/browse/hbase-1364

J-D

On Sun, Jan 23, 2011 at 11:08 AM, M. C. Srivas <mc...@gmail.com> wrote:
> Hey JD,
>
>   when the RS dies, the regions that it was serving are spread out amongst
> the rest of the RS's, correct?  But isn't the WAL a per-RS thingy rather
> than a per-region thingy? How do the other RS's then recover the regions
> alloted to them? Do they skip over log-records in the dead RS's WAL that do
> not belong to the regions not allocated to them?
>
>    Also, how is the dead RS's WAL garbage-collected?
>
> thanks,
> Srivas.
>
>

Re: Data loss on clean RS shutdown without WAL?

Posted by "M. C. Srivas" <mc...@gmail.com>.
Hey JD,

   when the RS dies, the regions that it was serving are spread out amongst
the rest of the RS's, correct?  But isn't the WAL a per-RS thingy rather
than a per-region thingy? How do the other RS's then recover the regions
alloted to them? Do they skip over log-records in the dead RS's WAL that do
not belong to the regions not allocated to them?

    Also, how is the dead RS's WAL garbage-collected?

thanks,
Srivas.


On Fri, Jan 21, 2011 at 9:32 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> If the region servers gets YouAreDeadException, it does an "abort" and
> won't flush the data since another region server could already be
> serving the region. If you're not writing to the WAL, then yes it's
> data loss.
>
> Not sure what you mean by "shuts down cleanly" in your case, if you
> see a log that starts with "Aborting region server" then it's not
> really "clean".
>
> J-D
>
> On Fri, Jan 21, 2011 at 2:38 AM, Friso van Vollenhoven
> <fv...@xebia.com> wrote:
> > Hi all,
> >
> > Question: when a regionserver shuts down cleanly after a
> YouAreDeadException and the master nicely reassigns the regions, will you
> loose any data that was written to the memstore of the dead RS when not
> using WAL?
> >
> > There was no hard crash and not a single error in any of the logs (except
> for the FATAL: YouAreDeadException). The RS lost its zookeeper session after
> a timeout, probably GC combined with some other starvation on heavy load. I
> think the memstore flushes on shutdown, but I am not entirely sure what
> happens in the situation where regions are already opened by other
> regionservers when the dying executes the shutdown code. Can I assume that
> the RS that gets reassigned a region creates a new HFile and that this will
> be compacted together with the one left by the dead RS at the next
> compaction run?
> >
> >
> > Thanks,
> > Friso
> >
> >
>

RE: Data loss on clean RS shutdown without WAL?

Posted by Friso van Vollenhoven <fv...@xebia.com>.
Thanks J-D.

With cleanly, I meant that it did not produce any errors other than the YouAreDeadException. When a very long pause causes this I often also see HDFS client related errors because of leases expiring, but this time that was not the case. It's not "cleanly" in the sense that I issued the shutdown myself.


Friso


________________________________________
Van: jdcryans@gmail.com [jdcryans@gmail.com] namens Jean-Daniel Cryans [jdcryans@apache.org]
Verzonden: vrijdag 21 januari 2011 18:32
Aan: user@hbase.apache.org
Onderwerp: Re: Data loss on clean RS shutdown without WAL?

If the region servers gets YouAreDeadException, it does an "abort" and
won't flush the data since another region server could already be
serving the region. If you're not writing to the WAL, then yes it's
data loss.

Not sure what you mean by "shuts down cleanly" in your case, if you
see a log that starts with "Aborting region server" then it's not
really "clean".

J-D

On Fri, Jan 21, 2011 at 2:38 AM, Friso van Vollenhoven
<fv...@xebia.com> wrote:
> Hi all,
>
> Question: when a regionserver shuts down cleanly after a YouAreDeadException and the master nicely reassigns the regions, will you loose any data that was written to the memstore of the dead RS when not using WAL?
>
> There was no hard crash and not a single error in any of the logs (except for the FATAL: YouAreDeadException). The RS lost its zookeeper session after a timeout, probably GC combined with some other starvation on heavy load. I think the memstore flushes on shutdown, but I am not entirely sure what happens in the situation where regions are already opened by other regionservers when the dying executes the shutdown code. Can I assume that the RS that gets reassigned a region creates a new HFile and that this will be compacted together with the one left by the dead RS at the next compaction run?
>
>
> Thanks,
> Friso
>
>

Re: Data loss on clean RS shutdown without WAL?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
If the region servers gets YouAreDeadException, it does an "abort" and
won't flush the data since another region server could already be
serving the region. If you're not writing to the WAL, then yes it's
data loss.

Not sure what you mean by "shuts down cleanly" in your case, if you
see a log that starts with "Aborting region server" then it's not
really "clean".

J-D

On Fri, Jan 21, 2011 at 2:38 AM, Friso van Vollenhoven
<fv...@xebia.com> wrote:
> Hi all,
>
> Question: when a regionserver shuts down cleanly after a YouAreDeadException and the master nicely reassigns the regions, will you loose any data that was written to the memstore of the dead RS when not using WAL?
>
> There was no hard crash and not a single error in any of the logs (except for the FATAL: YouAreDeadException). The RS lost its zookeeper session after a timeout, probably GC combined with some other starvation on heavy load. I think the memstore flushes on shutdown, but I am not entirely sure what happens in the situation where regions are already opened by other regionservers when the dying executes the shutdown code. Can I assume that the RS that gets reassigned a region creates a new HFile and that this will be compacted together with the one left by the dead RS at the next compaction run?
>
>
> Thanks,
> Friso
>
>