You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Zili Chen <wa...@gmail.com> on 2019/06/06 13:37:28 UTC

How does HBase deal with master switch?

Hi,

Recently from the book, ZooKeeper: Distributed Process Coordination, I find
a paragraph mentions that, HBase once suffered by

1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper
regarded it as failed.
2) ZooKeeper launched a new RegionServer, and the new one started to serve.
3) The old RegionServer finished gc and thought itself was still active and
serving.

in Chapter 5 section 5.3.

I'm interested on it and would like to know how HBase community overcame
this issue.

Best,
tison.

Re: How does HBase deal with master switch?

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Yes, in production it usually happens when there is a very long GC which
causes the RS to die and all the regions have been assigned to other RSes
before the RS is back and kills itself.

Natalie Chen <na...@gmail.com> 于2019年6月7日周五 下午3:03写道:

> The case about zookeeper is well known since data is actually saved
> locally.
>
> But, I thought RS writes/reads data to /from HDFS so there’s no such
> problem as replication latency.
>
> Can we say that the only chance for getting stale data from RS is what you
> have described here and I only have to monitor RS heartbeat and control gc
> pause?
>
> Thank you.
>
>
>
> 张铎(Duo Zhang) <pa...@gmail.com>於 2019年6月7日 週五,下午1:50寫道:
>
> > Lots of distributed databases can not guarantee external consistency.
> Even
> > for zookeeper, when you update A and then tell others to get A, the
> others
> > may get a stale value since it may read from another replica which has
> not
> > received the value yet.
> >
> > There are several ways to solve the problem in HBase, for example, record
> > the time when we successfully received the last heartbeat from zk, and if
> > it has been too long then we just throw exception to client. But this is
> > not a big deal for most use cases, as in the same session, if you
> > successfully update a value then you can see the new value when reading.
> > For the external consistency, there are also several ways to solve it.
> >
> > So take your own risk, if you think external consistency is super
> important
> > to you, then you’d better choose another db. But please consider it
> > carefully, as said above, lots of databases do not guarantee this
> either...
> >
> > Natalie Chen <na...@gmail.com>于2019年6月7日 周五11:59写道:
> >
> > > Hi,
> > >
> > > I am quite concerned about the possibility of getting stale data. I was
> > > expecting consistency in HBase while choosing HBase as our nonsql db
> > > solution.
> > >
> > > So, if consistency is not guaranteed, meaning clients expecting to see
> > > latest data but, because of long gc or whatever, got wrong data instead
> > > from a “dead” RS, even the chance is slight, I have to be able to
> detect
> > > and repair the situation or just consider looking for other more
> suitable
> > > solution.
> > >
> > > So, would you kindly confirm that HBase has this “consistency” issue?
> > >
> > > Thank you.
> > >
> > >
> > >
> > > 张铎(Duo Zhang) <pa...@gmail.com>於 2019年6月6日 週四,下午9:58寫道:
> > >
> > > > Once a RS is started, it will create its wal directory and start to
> > write
> > > > wal into it. And if master thinks a RS is dead, it will rename the
> wal
> > > > directory of the RS and call recover lease on all the wal files under
> > the
> > > > directory to make sure that they are all closed. So even after the RS
> > is
> > > > back after a long GC, before it kills itself because of the
> > > > SessionExpiredException, it can not accept any write requests any
> more
> > > > since its old wal file is closed and the wal directory is also gone
> so
> > it
> > > > can not create new wal files either.
> > > >
> > > > Of course, you may still read from the dead RS at this moment
> > > > so theoretically you could read a stale data, which means HBase can
> not
> > > > guarantee ‘external consistency’.
> > > >
> > > > Hope this solves your problem.
> > > >
> > > > Thanks.
> > > >
> > > > Zili Chen <wa...@gmail.com> 于2019年6月6日周四 下午9:38写道:
> > > >
> > > > > Hi,
> > > > >
> > > > > Recently from the book, ZooKeeper: Distributed Process
> Coordination,
> > I
> > > > find
> > > > > a paragraph mentions that, HBase once suffered by
> > > > >
> > > > > 1) RegionServer started full gc and timeout on ZooKeeper. Thus
> > > ZooKeeper
> > > > > regarded it as failed.
> > > > > 2) ZooKeeper launched a new RegionServer, and the new one started
> to
> > > > serve.
> > > > > 3) The old RegionServer finished gc and thought itself was still
> > active
> > > > and
> > > > > serving.
> > > > >
> > > > > in Chapter 5 section 5.3.
> > > > >
> > > > > I'm interested on it and would like to know how HBase community
> > > overcame
> > > > > this issue.
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > >
> > >
> >
>

Re: How does HBase deal with master switch?

Posted by Natalie Chen <na...@gmail.com>.
The case about zookeeper is well known since data is actually saved locally.

But, I thought RS writes/reads data to /from HDFS so there’s no such
problem as replication latency.

Can we say that the only chance for getting stale data from RS is what you
have described here and I only have to monitor RS heartbeat and control gc
pause?

Thank you.



张铎(Duo Zhang) <pa...@gmail.com>於 2019年6月7日 週五,下午1:50寫道:

> Lots of distributed databases can not guarantee external consistency. Even
> for zookeeper, when you update A and then tell others to get A, the others
> may get a stale value since it may read from another replica which has not
> received the value yet.
>
> There are several ways to solve the problem in HBase, for example, record
> the time when we successfully received the last heartbeat from zk, and if
> it has been too long then we just throw exception to client. But this is
> not a big deal for most use cases, as in the same session, if you
> successfully update a value then you can see the new value when reading.
> For the external consistency, there are also several ways to solve it.
>
> So take your own risk, if you think external consistency is super important
> to you, then you’d better choose another db. But please consider it
> carefully, as said above, lots of databases do not guarantee this either...
>
> Natalie Chen <na...@gmail.com>于2019年6月7日 周五11:59写道:
>
> > Hi,
> >
> > I am quite concerned about the possibility of getting stale data. I was
> > expecting consistency in HBase while choosing HBase as our nonsql db
> > solution.
> >
> > So, if consistency is not guaranteed, meaning clients expecting to see
> > latest data but, because of long gc or whatever, got wrong data instead
> > from a “dead” RS, even the chance is slight, I have to be able to detect
> > and repair the situation or just consider looking for other more suitable
> > solution.
> >
> > So, would you kindly confirm that HBase has this “consistency” issue?
> >
> > Thank you.
> >
> >
> >
> > 张铎(Duo Zhang) <pa...@gmail.com>於 2019年6月6日 週四,下午9:58寫道:
> >
> > > Once a RS is started, it will create its wal directory and start to
> write
> > > wal into it. And if master thinks a RS is dead, it will rename the wal
> > > directory of the RS and call recover lease on all the wal files under
> the
> > > directory to make sure that they are all closed. So even after the RS
> is
> > > back after a long GC, before it kills itself because of the
> > > SessionExpiredException, it can not accept any write requests any more
> > > since its old wal file is closed and the wal directory is also gone so
> it
> > > can not create new wal files either.
> > >
> > > Of course, you may still read from the dead RS at this moment
> > > so theoretically you could read a stale data, which means HBase can not
> > > guarantee ‘external consistency’.
> > >
> > > Hope this solves your problem.
> > >
> > > Thanks.
> > >
> > > Zili Chen <wa...@gmail.com> 于2019年6月6日周四 下午9:38写道:
> > >
> > > > Hi,
> > > >
> > > > Recently from the book, ZooKeeper: Distributed Process Coordination,
> I
> > > find
> > > > a paragraph mentions that, HBase once suffered by
> > > >
> > > > 1) RegionServer started full gc and timeout on ZooKeeper. Thus
> > ZooKeeper
> > > > regarded it as failed.
> > > > 2) ZooKeeper launched a new RegionServer, and the new one started to
> > > serve.
> > > > 3) The old RegionServer finished gc and thought itself was still
> active
> > > and
> > > > serving.
> > > >
> > > > in Chapter 5 section 5.3.
> > > >
> > > > I'm interested on it and would like to know how HBase community
> > overcame
> > > > this issue.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > >
> >
>

Re: How does HBase deal with master switch?

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Lots of distributed databases can not guarantee external consistency. Even
for zookeeper, when you update A and then tell others to get A, the others
may get a stale value since it may read from another replica which has not
received the value yet.

There are several ways to solve the problem in HBase, for example, record
the time when we successfully received the last heartbeat from zk, and if
it has been too long then we just throw exception to client. But this is
not a big deal for most use cases, as in the same session, if you
successfully update a value then you can see the new value when reading.
For the external consistency, there are also several ways to solve it.

So take your own risk, if you think external consistency is super important
to you, then you’d better choose another db. But please consider it
carefully, as said above, lots of databases do not guarantee this either...

Natalie Chen <na...@gmail.com>于2019年6月7日 周五11:59写道:

> Hi,
>
> I am quite concerned about the possibility of getting stale data. I was
> expecting consistency in HBase while choosing HBase as our nonsql db
> solution.
>
> So, if consistency is not guaranteed, meaning clients expecting to see
> latest data but, because of long gc or whatever, got wrong data instead
> from a “dead” RS, even the chance is slight, I have to be able to detect
> and repair the situation or just consider looking for other more suitable
> solution.
>
> So, would you kindly confirm that HBase has this “consistency” issue?
>
> Thank you.
>
>
>
> 张铎(Duo Zhang) <pa...@gmail.com>於 2019年6月6日 週四,下午9:58寫道:
>
> > Once a RS is started, it will create its wal directory and start to write
> > wal into it. And if master thinks a RS is dead, it will rename the wal
> > directory of the RS and call recover lease on all the wal files under the
> > directory to make sure that they are all closed. So even after the RS is
> > back after a long GC, before it kills itself because of the
> > SessionExpiredException, it can not accept any write requests any more
> > since its old wal file is closed and the wal directory is also gone so it
> > can not create new wal files either.
> >
> > Of course, you may still read from the dead RS at this moment
> > so theoretically you could read a stale data, which means HBase can not
> > guarantee ‘external consistency’.
> >
> > Hope this solves your problem.
> >
> > Thanks.
> >
> > Zili Chen <wa...@gmail.com> 于2019年6月6日周四 下午9:38写道:
> >
> > > Hi,
> > >
> > > Recently from the book, ZooKeeper: Distributed Process Coordination, I
> > find
> > > a paragraph mentions that, HBase once suffered by
> > >
> > > 1) RegionServer started full gc and timeout on ZooKeeper. Thus
> ZooKeeper
> > > regarded it as failed.
> > > 2) ZooKeeper launched a new RegionServer, and the new one started to
> > serve.
> > > 3) The old RegionServer finished gc and thought itself was still active
> > and
> > > serving.
> > >
> > > in Chapter 5 section 5.3.
> > >
> > > I'm interested on it and would like to know how HBase community
> overcame
> > > this issue.
> > >
> > > Best,
> > > tison.
> > >
> >
>

Re: How does HBase deal with master switch?

Posted by Natalie Chen <na...@gmail.com>.
Hi,

I am quite concerned about the possibility of getting stale data. I was
expecting consistency in HBase while choosing HBase as our nonsql db
solution.

So, if consistency is not guaranteed, meaning clients expecting to see
latest data but, because of long gc or whatever, got wrong data instead
from a “dead” RS, even the chance is slight, I have to be able to detect
and repair the situation or just consider looking for other more suitable
solution.

So, would you kindly confirm that HBase has this “consistency” issue?

Thank you.



张铎(Duo Zhang) <pa...@gmail.com>於 2019年6月6日 週四,下午9:58寫道:

> Once a RS is started, it will create its wal directory and start to write
> wal into it. And if master thinks a RS is dead, it will rename the wal
> directory of the RS and call recover lease on all the wal files under the
> directory to make sure that they are all closed. So even after the RS is
> back after a long GC, before it kills itself because of the
> SessionExpiredException, it can not accept any write requests any more
> since its old wal file is closed and the wal directory is also gone so it
> can not create new wal files either.
>
> Of course, you may still read from the dead RS at this moment
> so theoretically you could read a stale data, which means HBase can not
> guarantee ‘external consistency’.
>
> Hope this solves your problem.
>
> Thanks.
>
> Zili Chen <wa...@gmail.com> 于2019年6月6日周四 下午9:38写道:
>
> > Hi,
> >
> > Recently from the book, ZooKeeper: Distributed Process Coordination, I
> find
> > a paragraph mentions that, HBase once suffered by
> >
> > 1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper
> > regarded it as failed.
> > 2) ZooKeeper launched a new RegionServer, and the new one started to
> serve.
> > 3) The old RegionServer finished gc and thought itself was still active
> and
> > serving.
> >
> > in Chapter 5 section 5.3.
> >
> > I'm interested on it and would like to know how HBase community overcame
> > this issue.
> >
> > Best,
> > tison.
> >
>

Re: How does HBase deal with master switch?

Posted by Zili Chen <wa...@gmail.com>.
Thanks for your reply and clarification!

It sounds like a mechanism like fencing?

I'd also like to look for JIRAs about this issue, that is,
coordination in master switch. Maybe some like this one[1]?

Best,
tison.

[1] https://issues.apache.org/jira/browse/HBASE-5549


Wellington Chevreuil <we...@gmail.com> 于2019年6月6日周四
下午10:15写道:

> Hey Zili,
>
> Besides what Duo explained previously, just clarifying on some concepts to
> your previous description:
>
> 1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper
> > regarded it as failed.
> >
> ZK just knows about sessions and clients, not the type of client connecting
> to it. Clients open a session in ZK, then keep pinging back ZK
> periodically, to keep the session alive. In the case of long full GC
> pauses, the client (RS, in this case), will fail to ping back within the
> required period. At this point, ZK will *expire *the session.
>
> 2) ZooKeeper launched a new RegionServer, and the new one started to serve.
> >
> ZK doesn't launch new RS, it doesn't know about RSes, only client sessions.
> With the session expiration, Master will be notified that an RS is
> potentially gone, and will start the process explained by Duo.
>
> 3) The old RegionServer finished gc and thought itself was still active and
> > serving.
> >
> What really happens here is that once RS is back from GC, it will try ping
> ZK again for that session, ZK will back it off because the session is
> already expired, then RS will kill itself.
>
>
>
>
>
> Em qui, 6 de jun de 2019 às 14:58, 张铎(Duo Zhang) <pa...@gmail.com>
> escreveu:
>
> > Once a RS is started, it will create its wal directory and start to write
> > wal into it. And if master thinks a RS is dead, it will rename the wal
> > directory of the RS and call recover lease on all the wal files under the
> > directory to make sure that they are all closed. So even after the RS is
> > back after a long GC, before it kills itself because of the
> > SessionExpiredException, it can not accept any write requests any more
> > since its old wal file is closed and the wal directory is also gone so it
> > can not create new wal files either.
> >
> > Of course, you may still read from the dead RS at this moment
> > so theoretically you could read a stale data, which means HBase can not
> > guarantee ‘external consistency’.
> >
> > Hope this solves your problem.
> >
> > Thanks.
> >
> > Zili Chen <wa...@gmail.com> 于2019年6月6日周四 下午9:38写道:
> >
> > > Hi,
> > >
> > > Recently from the book, ZooKeeper: Distributed Process Coordination, I
> > find
> > > a paragraph mentions that, HBase once suffered by
> > >
> > > 1) RegionServer started full gc and timeout on ZooKeeper. Thus
> ZooKeeper
> > > regarded it as failed.
> > > 2) ZooKeeper launched a new RegionServer, and the new one started to
> > serve.
> > > 3) The old RegionServer finished gc and thought itself was still active
> > and
> > > serving.
> > >
> > > in Chapter 5 section 5.3.
> > >
> > > I'm interested on it and would like to know how HBase community
> overcame
> > > this issue.
> > >
> > > Best,
> > > tison.
> > >
> >
>

Re: How does HBase deal with master switch?

Posted by Wellington Chevreuil <we...@gmail.com>.
Hey Zili,

Besides what Duo explained previously, just clarifying on some concepts to
your previous description:

1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper
> regarded it as failed.
>
ZK just knows about sessions and clients, not the type of client connecting
to it. Clients open a session in ZK, then keep pinging back ZK
periodically, to keep the session alive. In the case of long full GC
pauses, the client (RS, in this case), will fail to ping back within the
required period. At this point, ZK will *expire *the session.

2) ZooKeeper launched a new RegionServer, and the new one started to serve.
>
ZK doesn't launch new RS, it doesn't know about RSes, only client sessions.
With the session expiration, Master will be notified that an RS is
potentially gone, and will start the process explained by Duo.

3) The old RegionServer finished gc and thought itself was still active and
> serving.
>
What really happens here is that once RS is back from GC, it will try ping
ZK again for that session, ZK will back it off because the session is
already expired, then RS will kill itself.





Em qui, 6 de jun de 2019 às 14:58, 张铎(Duo Zhang) <pa...@gmail.com>
escreveu:

> Once a RS is started, it will create its wal directory and start to write
> wal into it. And if master thinks a RS is dead, it will rename the wal
> directory of the RS and call recover lease on all the wal files under the
> directory to make sure that they are all closed. So even after the RS is
> back after a long GC, before it kills itself because of the
> SessionExpiredException, it can not accept any write requests any more
> since its old wal file is closed and the wal directory is also gone so it
> can not create new wal files either.
>
> Of course, you may still read from the dead RS at this moment
> so theoretically you could read a stale data, which means HBase can not
> guarantee ‘external consistency’.
>
> Hope this solves your problem.
>
> Thanks.
>
> Zili Chen <wa...@gmail.com> 于2019年6月6日周四 下午9:38写道:
>
> > Hi,
> >
> > Recently from the book, ZooKeeper: Distributed Process Coordination, I
> find
> > a paragraph mentions that, HBase once suffered by
> >
> > 1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper
> > regarded it as failed.
> > 2) ZooKeeper launched a new RegionServer, and the new one started to
> serve.
> > 3) The old RegionServer finished gc and thought itself was still active
> and
> > serving.
> >
> > in Chapter 5 section 5.3.
> >
> > I'm interested on it and would like to know how HBase community overcame
> > this issue.
> >
> > Best,
> > tison.
> >
>

Re: How does HBase deal with master switch?

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Once a RS is started, it will create its wal directory and start to write
wal into it. And if master thinks a RS is dead, it will rename the wal
directory of the RS and call recover lease on all the wal files under the
directory to make sure that they are all closed. So even after the RS is
back after a long GC, before it kills itself because of the
SessionExpiredException, it can not accept any write requests any more
since its old wal file is closed and the wal directory is also gone so it
can not create new wal files either.

Of course, you may still read from the dead RS at this moment
so theoretically you could read a stale data, which means HBase can not
guarantee ‘external consistency’.

Hope this solves your problem.

Thanks.

Zili Chen <wa...@gmail.com> 于2019年6月6日周四 下午9:38写道:

> Hi,
>
> Recently from the book, ZooKeeper: Distributed Process Coordination, I find
> a paragraph mentions that, HBase once suffered by
>
> 1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper
> regarded it as failed.
> 2) ZooKeeper launched a new RegionServer, and the new one started to serve.
> 3) The old RegionServer finished gc and thought itself was still active and
> serving.
>
> in Chapter 5 section 5.3.
>
> I'm interested on it and would like to know how HBase community overcame
> this issue.
>
> Best,
> tison.
>