You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2009/06/23 21:32:58 UTC

Confused about KeeperState.Disconnected and KeeperState.Expired

Hey all,

Working on integrating HBase with ZK, we came around an issue that we
are unable to resolve. I was trying to see how was our handling of
network partitions and session expirations and what I did is just
starting a single ZK instance with a very simple HBase setup, then I
killed the ZK server. The only thing I got from Zookeeper was a
KeeperState.Disconnected then... nothing (for like 20+ minutes).
Normally if I had a quorum I would still get that message but then I
would get another one telling me it's connected to another ZK quorum
server. So how do I know if I'm really partitioned from the ZK quroum?
Shouldn't we get a session expired at some point? From what I
understand you can only get a KeeperState.Expired when you connect
back to the quorum after x time, but what if you can "never" connect
back to it?

BTW this is r785019.

Thx a lot!

J-D

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

Posted by Benjamin Reed <br...@yahoo-inc.com>.

sorry to jump in late.

if i understand the scenario correctly, you are partitioned from ZK, but 
you still have access to the NN on which you are holding leases to 
files. the problem is that even though your ephemeral nodes may timeout, 
you are still holding a lease on the NN and recovery would go faster if 
you actually closed the file. right? or is it deeper than that? can you 
open a file in such a way that you stomp the lease? or make sure that 
the lease timeout is smaller than the session timeout and only renew if 
you are still connected to ZK?

thanx
ben

Jean-Daniel Cryans wrote:
> If the machine was completely partitioned, as far as I know, it would lose
> it's lease so the only thing we have to make sure about is clearing the
> state of the region server by doing a "restart" so that it's ready to come
> back in the cluster. If ZK is down but the rest is up, closing the files in
> HDFS should ensure that we lose a minimum of data if not losing any.
>
> I think that in a multi-rack setup it is possible to not be able to talk to
> ZK but to be able to talk to the Namenode as machines can be anywhere.
> Especially in HBase 0.20, the master can failover on any node that has a
> backup Master ready. So in that case, the region server should consider
> itself gone from the cluster and close any connection it has and restart.
>
> Those are very legetimate questions Gustavo, thanks for asking.
>
> J-D
>
> On Wed, Jun 24, 2009 at 3:38 PM, Gustavo Niemeyer <gu...@niemeyer.net>wrote:
>
>   
>>> Ben's opinion is that it should not belong in the default API but in the
>>> common client that another recent thread was about. My opinion is just
>>>       
>> that
>>     
>>> I need such a functionality, wherever it is.
>>>       
>> Understood, sorry.  I just meant that it feels like something that
>> would likely be useful to other people too, so might have a role in
>> the default API to ensure it gets done properly considering the
>> details that Ben brought up.
>>
>>     
>>> If the node gets the exception (or has it's own timer), as I wrote, it
>>>       
>> will
>>     
>>> shut itself down to release HDFS leases as fast as possible. If ZK is
>>>       
>> really
>>     
>>> down and it's not a network partition, then HBase is down and this is
>>>       
>> fine
>>     
>>> because it won't be able to work anyway.
>>>       
>> Right, that's mostly what I was wondering.  I was pondering about
>> under which circumstances the node would be unable to talk to the
>> ZooKeeper server but would still be holding the HDFS lease in a way
>> that prevented the rest of the system from going on.  If I understand
>> what you mean, if ZooKeeper is down entirely, HBase would be down for
>> good. If the machine was partitioned off entirely, the HDFS side of
>> things will also be disconnected, so shutting the node down won't help
>> the rest of the system recovering.
>>
>> --
>> Gustavo Niemeyer
>> http://niemeyer.net
>>
>>

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

Posted by Jean-Daniel Cryans <jd...@apache.org>.

If the machine was completely partitioned, as far as I know, it would lose
it's lease so the only thing we have to make sure about is clearing the
state of the region server by doing a "restart" so that it's ready to come
back in the cluster. If ZK is down but the rest is up, closing the files in
HDFS should ensure that we lose a minimum of data if not losing any.

I think that in a multi-rack setup it is possible to not be able to talk to
ZK but to be able to talk to the Namenode as machines can be anywhere.
Especially in HBase 0.20, the master can failover on any node that has a
backup Master ready. So in that case, the region server should consider
itself gone from the cluster and close any connection it has and restart.

Those are very legetimate questions Gustavo, thanks for asking.

J-D

On Wed, Jun 24, 2009 at 3:38 PM, Gustavo Niemeyer <gu...@niemeyer.net>wrote:

> > Ben's opinion is that it should not belong in the default API but in the
> > common client that another recent thread was about. My opinion is just
> that
> > I need such a functionality, wherever it is.
>
> Understood, sorry.  I just meant that it feels like something that
> would likely be useful to other people too, so might have a role in
> the default API to ensure it gets done properly considering the
> details that Ben brought up.
>
> > If the node gets the exception (or has it's own timer), as I wrote, it
> will
> > shut itself down to release HDFS leases as fast as possible. If ZK is
> really
> > down and it's not a network partition, then HBase is down and this is
> fine
> > because it won't be able to work anyway.
>
> Right, that's mostly what I was wondering.  I was pondering about
> under which circumstances the node would be unable to talk to the
> ZooKeeper server but would still be holding the HDFS lease in a way
> that prevented the rest of the system from going on.  If I understand
> what you mean, if ZooKeeper is down entirely, HBase would be down for
> good. If the machine was partitioned off entirely, the HDFS side of
> things will also be disconnected, so shutting the node down won't help
> the rest of the system recovering.
>
> --
> Gustavo Niemeyer
> http://niemeyer.net
>

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

Posted by Gustavo Niemeyer <gu...@niemeyer.net>.

> Ben's opinion is that it should not belong in the default API but in the
> common client that another recent thread was about. My opinion is just that
> I need such a functionality, wherever it is.

Understood, sorry.  I just meant that it feels like something that
would likely be useful to other people too, so might have a role in
the default API to ensure it gets done properly considering the
details that Ben brought up.

> If the node gets the exception (or has it's own timer), as I wrote, it will
> shut itself down to release HDFS leases as fast as possible. If ZK is really
> down and it's not a network partition, then HBase is down and this is fine
> because it won't be able to work anyway.

Right, that's mostly what I was wondering.  I was pondering about
under which circumstances the node would be unable to talk to the
ZooKeeper server but would still be holding the HDFS lease in a way
that prevented the rest of the system from going on.  If I understand
what you mean, if ZooKeeper is down entirely, HBase would be down for
good. If the machine was partitioned off entirely, the HDFS side of
things will also be disconnected, so shutting the node down won't help
the rest of the system recovering.

-- 
Gustavo Niemeyer
http://niemeyer.net

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Gustavo,

Ben's opinion is that it should not belong in the default API but in the
common client that another recent thread was about. My opinion is just that
I need such a functionality, wherever it is.

If the node gets the exception (or has it's own timer), as I wrote, it will
shut itself down to release HDFS leases as fast as possible. If ZK is really
down and it's not a network partition, then HBase is down and this is fine
because it won't be able to work anyway.

J-D

On Wed, Jun 24, 2009 at 3:15 PM, Gustavo Niemeyer <gu...@niemeyer.net>wrote:

> Hi Jean-Daniel,
>
> > I understand, maybe the common client is the best place.
>
> It sounds like something useful to have in the default API, FWIW.
>
> > In our situation, if a HBase region server is in the state of being
> > disconnected for too long, the regions it's holding cannot be reached so
> > this is a major problem. Also, if the HMaster node gets the event that an
>
> Out of curiosity, what do you intend to do when you get the exception?
>  I mean, if you didn't get the expiration exception it means that the
> reconnection isn't working in any case, so how do you plan to recover?
>
> --
> Gustavo Niemeyer
> http://niemeyer.net
>

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

Posted by Gustavo Niemeyer <gu...@niemeyer.net>.

Hi Jean-Daniel,

> I understand, maybe the common client is the best place.

It sounds like something useful to have in the default API, FWIW.

> In our situation, if a HBase region server is in the state of being
> disconnected for too long, the regions it's holding cannot be reached so
> this is a major problem. Also, if the HMaster node gets the event that an

Out of curiosity, what do you intend to do when you get the exception?
 I mean, if you didn't get the expiration exception it means that the
reconnection isn't working in any case, so how do you plan to recover?

-- 
Gustavo Niemeyer
http://niemeyer.net

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

Posted by Jean-Daniel Cryans <jd...@apache.org>.

I understand, maybe the common client is the best place.

In our situation, if a HBase region server is in the state of being
disconnected for too long, the regions it's holding cannot be reached so
this is a major problem. Also, if the HMaster node gets the event that an
ephemeral is gone, it will begin processing its region server's WAL and if
the region server is still able to talk to HDFS, we have a lease handling
problem. So, in that state of disconnection, the region server should just
kill itself and completely restart.

J-D

On Wed, Jun 24, 2009 at 2:00 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:

> perhaps it would fit into the common client that stefan is proposing. we
> don't have such a timer currently in the client code that we just need to
> expose, so it will be something we need to add. one thing to be careful of
> is trying to be too tricky. you don't want to trigger right after the
> session timeout because things can be in flight and a session renewal
> response might actually be on the way or the service bounced due to a leader
> failure, which is why i was recommending something like twice the session
> timeout.
>
> to be honest i think most of our applications just sit there trying to
> reconnect forever. after all if you do close the session and try to move on,
> the ZooKeeper service is still down, so trying with a new ZooKeeper handle
> isn't going to help anything.
>
>
> ben
>
> Jean-Daniel Cryans wrote:
>
>> Ben,
>>
>> Thank you, I now see the rationale in not telling the client it's session
>> is
>> over because you can't be sure it actually is. But would it make sense to
>> add a new state in KeeperState representing that corner case? Something
>> like
>> AfterSessionTimeout. I'm pretty sure other would find that useful for the
>> same reason as us.
>>
>> If anyone +1 on that, I'll open a jira and give it a try.
>>
>> J-D
>>
>> On Tue, Jun 23, 2009 at 6:04 PM, Benjamin Reed <br...@yahoo-inc.com>
>> wrote:
>>
>>
>>
>>> ZooKeeper only tells you about states that it is sure about, so you will
>>> not get the Expired event until you reconnect to ZooKeeper. if you never
>>> connect again to ZooKeeper, you will not get the Expired event. if you
>>> want
>>> to timeout using some sanity value, 2 times the session timeout for
>>> example,
>>> you can implement that yourself by setting a timer when you get the
>>> disconnected event and then close the session explicitly when the timer
>>> goes
>>> off.
>>>
>>> there is a caveat in doing this: if your whole cluster goes down for 20
>>> mins and then comes back up, your session timeout will get reset and the
>>> session will still be alive even though you have closed it. it will then
>>> have to timeout before it actually goes away. closing the session when
>>> the
>>> client is disconnected just stops the client from trying to reconnect.
>>>
>>> does this make sense?
>>>
>>> ben
>>>
>>>
>>> Jean-Daniel Cryans wrote:
>>>
>>>
>>>
>>>> Hey all,
>>>>
>>>> Working on integrating HBase with ZK, we came around an issue that we
>>>> are unable to resolve. I was trying to see how was our handling of
>>>> network partitions and session expirations and what I did is just
>>>> starting a single ZK instance with a very simple HBase setup, then I
>>>> killed the ZK server. The only thing I got from Zookeeper was a
>>>> KeeperState.Disconnected then... nothing (for like 20+ minutes).
>>>> Normally if I had a quorum I would still get that message but then I
>>>> would get another one telling me it's connected to another ZK quorum
>>>> server. So how do I know if I'm really partitioned from the ZK quroum?
>>>> Shouldn't we get a session expired at some point? From what I
>>>> understand you can only get a KeeperState.Expired when you connect
>>>> back to the quorum after x time, but what if you can "never" connect
>>>> back to it?
>>>>
>>>> BTW this is r785019.
>>>>
>>>> Thx a lot!
>>>>
>>>> J-D
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

Posted by Benjamin Reed <br...@yahoo-inc.com>.

perhaps it would fit into the common client that stefan is proposing. we 
don't have such a timer currently in the client code that we just need 
to expose, so it will be something we need to add. one thing to be 
careful of is trying to be too tricky. you don't want to trigger right 
after the session timeout because things can be in flight and a session 
renewal response might actually be on the way or the service bounced due 
to a leader failure, which is why i was recommending something like 
twice the session timeout.

to be honest i think most of our applications just sit there trying to 
reconnect forever. after all if you do close the session and try to move 
on, the ZooKeeper service is still down, so trying with a new ZooKeeper 
handle isn't going to help anything.

ben

Jean-Daniel Cryans wrote:
> Ben,
>
> Thank you, I now see the rationale in not telling the client it's session is
> over because you can't be sure it actually is. But would it make sense to
> add a new state in KeeperState representing that corner case? Something like
> AfterSessionTimeout. I'm pretty sure other would find that useful for the
> same reason as us.
>
> If anyone +1 on that, I'll open a jira and give it a try.
>
> J-D
>
> On Tue, Jun 23, 2009 at 6:04 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:
>
>   
>> ZooKeeper only tells you about states that it is sure about, so you will
>> not get the Expired event until you reconnect to ZooKeeper. if you never
>> connect again to ZooKeeper, you will not get the Expired event. if you want
>> to timeout using some sanity value, 2 times the session timeout for example,
>> you can implement that yourself by setting a timer when you get the
>> disconnected event and then close the session explicitly when the timer goes
>> off.
>>
>> there is a caveat in doing this: if your whole cluster goes down for 20
>> mins and then comes back up, your session timeout will get reset and the
>> session will still be alive even though you have closed it. it will then
>> have to timeout before it actually goes away. closing the session when the
>> client is disconnected just stops the client from trying to reconnect.
>>
>> does this make sense?
>>
>> ben
>>
>>
>> Jean-Daniel Cryans wrote:
>>
>>     
>>> Hey all,
>>>
>>> Working on integrating HBase with ZK, we came around an issue that we
>>> are unable to resolve. I was trying to see how was our handling of
>>> network partitions and session expirations and what I did is just
>>> starting a single ZK instance with a very simple HBase setup, then I
>>> killed the ZK server. The only thing I got from Zookeeper was a
>>> KeeperState.Disconnected then... nothing (for like 20+ minutes).
>>> Normally if I had a quorum I would still get that message but then I
>>> would get another one telling me it's connected to another ZK quorum
>>> server. So how do I know if I'm really partitioned from the ZK quroum?
>>> Shouldn't we get a session expired at some point? From what I
>>> understand you can only get a KeeperState.Expired when you connect
>>> back to the quorum after x time, but what if you can "never" connect
>>> back to it?
>>>
>>> BTW this is r785019.
>>>
>>> Thx a lot!
>>>
>>> J-D
>>>
>>>
>>>       
>>

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Ben,

Thank you, I now see the rationale in not telling the client it's session is
over because you can't be sure it actually is. But would it make sense to
add a new state in KeeperState representing that corner case? Something like
AfterSessionTimeout. I'm pretty sure other would find that useful for the
same reason as us.

If anyone +1 on that, I'll open a jira and give it a try.

J-D

On Tue, Jun 23, 2009 at 6:04 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:

> ZooKeeper only tells you about states that it is sure about, so you will
> not get the Expired event until you reconnect to ZooKeeper. if you never
> connect again to ZooKeeper, you will not get the Expired event. if you want
> to timeout using some sanity value, 2 times the session timeout for example,
> you can implement that yourself by setting a timer when you get the
> disconnected event and then close the session explicitly when the timer goes
> off.
>
> there is a caveat in doing this: if your whole cluster goes down for 20
> mins and then comes back up, your session timeout will get reset and the
> session will still be alive even though you have closed it. it will then
> have to timeout before it actually goes away. closing the session when the
> client is disconnected just stops the client from trying to reconnect.
>
> does this make sense?
>
> ben
>
>
> Jean-Daniel Cryans wrote:
>
>> Hey all,
>>
>> Working on integrating HBase with ZK, we came around an issue that we
>> are unable to resolve. I was trying to see how was our handling of
>> network partitions and session expirations and what I did is just
>> starting a single ZK instance with a very simple HBase setup, then I
>> killed the ZK server. The only thing I got from Zookeeper was a
>> KeeperState.Disconnected then... nothing (for like 20+ minutes).
>> Normally if I had a quorum I would still get that message but then I
>> would get another one telling me it's connected to another ZK quorum
>> server. So how do I know if I'm really partitioned from the ZK quroum?
>> Shouldn't we get a session expired at some point? From what I
>> understand you can only get a KeeperState.Expired when you connect
>> back to the quorum after x time, but what if you can "never" connect
>> back to it?
>>
>> BTW this is r785019.
>>
>> Thx a lot!
>>
>> J-D
>>
>>
>
>

Re: Confused about KeeperState.Disconnected and KeeperState.Expired

Posted by Benjamin Reed <br...@yahoo-inc.com>.

ZooKeeper only tells you about states that it is sure about, so you will 
not get the Expired event until you reconnect to ZooKeeper. if you never 
connect again to ZooKeeper, you will not get the Expired event. if you 
want to timeout using some sanity value, 2 times the session timeout for 
example, you can implement that yourself by setting a timer when you get 
the disconnected event and then close the session explicitly when the 
timer goes off.

there is a caveat in doing this: if your whole cluster goes down for 20 
mins and then comes back up, your session timeout will get reset and the 
session will still be alive even though you have closed it. it will then 
have to timeout before it actually goes away. closing the session when 
the client is disconnected just stops the client from trying to reconnect.

does this make sense?

ben

Jean-Daniel Cryans wrote:
> Hey all,
>
> Working on integrating HBase with ZK, we came around an issue that we
> are unable to resolve. I was trying to see how was our handling of
> network partitions and session expirations and what I did is just
> starting a single ZK instance with a very simple HBase setup, then I
> killed the ZK server. The only thing I got from Zookeeper was a
> KeeperState.Disconnected then... nothing (for like 20+ minutes).
> Normally if I had a quorum I would still get that message but then I
> would get another one telling me it's connected to another ZK quorum
> server. So how do I know if I'm really partitioned from the ZK quroum?
> Shouldn't we get a session expired at some point? From what I
> understand you can only get a KeeperState.Expired when you connect
> back to the quorum after x time, but what if you can "never" connect
> back to it?
>
> BTW this is r785019.
>
> Thx a lot!
>
> J-D
>