You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Al Lias <al...@gmx.de> on 2010/03/29 12:58:12 UTC

Short DNS outage leads to "No .META. found"

We have a DNS installation that has a HA-Logic, that may fail for say 10
seconds.

In such a case we experience the following:

* DNS goes down
* The Master gets this: "Received report from unknown server -- telling
it to MSG_CALL_SERVER_STARTUP" (Probably the IP is "unknown")
* The Regionservers do as directed, zookeeper logs state that /hbase/rs/
nodes are updated
* DNS goes up

Now there is no or a wrong master selection and no region can be served
anymore. Also, no other MSG_CALL_SERVER_STARTUP appear, which could
reanimate the cluster...

We use host names in the regionservers file.

What could we change to be more robust against such a problem?

Thx,

   Al

Re: Short DNS outage leads to "No .META. found"

Posted by Jean-Daniel Cryans <jd...@apache.org>.
This was fixed in https://issues.apache.org/jira/browse/HBASE-2174,
will be available in 0.20.4 (or you can patch it on your 0.20.3,
should apply easily).

J-D

On Mon, Mar 29, 2010 at 3:58 AM, Al Lias <al...@gmx.de> wrote:
> We have a DNS installation that has a HA-Logic, that may fail for say 10
> seconds.
>
> In such a case we experience the following:
>
> * DNS goes down
> * The Master gets this: "Received report from unknown server -- telling
> it to MSG_CALL_SERVER_STARTUP" (Probably the IP is "unknown")
> * The Regionservers do as directed, zookeeper logs state that /hbase/rs/
> nodes are updated
> * DNS goes up
>
> Now there is no or a wrong master selection and no region can be served
> anymore. Also, no other MSG_CALL_SERVER_STARTUP appear, which could
> reanimate the cluster...
>
> We use host names in the regionservers file.
>
> What could we change to be more robust against such a problem?
>
> Thx,
>
>   Al
>