You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by "Miller, Austin" <Au...@morganstanley.com> on 2014/07/11 19:31:15 UTC

exposing lastSend

Hello,

I'm looking for a way to get access to ClientCnxnSocket.lastSend without trying to break ZooKeeperEncapsulation. Broadly, my goal is to use this in a Java process in order to increase confidence in transactions.

There are resource starved situations where the ClientCnxn.SendThread may not be scheduled for greater than negotiatedSessionTimeout. My understanding is this will lead to session loss because even though the connection might still be considered alive (the OS could be sending ACKs to packets ZK sends), the ZK server requires the client to be sending the packets.

So, assuming
...a JVM process was connected to a ZK ensemble
...the JVM process is performing transactions
...ZK is being used for distributed locking with coarse granularity
...a reliable low-latency network connection to a healthy low-latency ensemble
...a rare event causes the machine hosting the JVM to be resource starved
...none of the JVM threads are scheduled for a window twice the length of the negotiatedSessionTimeout
...during this window, the process has lost the coarse lock on the ensemble (it was an ephemeral node)

Then the ensemble should have agreed that the session is dead, correct? Even though the connection may be considered alive at a TCP/IP transport level. What is more, just coming out of the state where the threads are scheduled, there is a race condition between the ZK threads firing session death event and the transaction threads committing transactions. As I write this, I realize I'm not entirely sure what events ZK would send and in what order, it depending on what was done before the freeze and where it was frozen.

Back to the broad goal, I want to increase confidence in this situation that the process still owns the ZK lock without firing off network events before committing every transaction. Obviously, fine granular locks would solve this problem, but that comes with an unacceptable performance trade off.

Now, let's say I could do something like "long org.apache.zookeeper.ZooKeeper.getLastSent()". Well, I don't know if the ZK server actually received the packet, assuming it did receive the packet I don't know when it received the packet, and I don't know when the OS received the ack. However, it does assert that the SendThread was scheduled and able to call System.getNanos() in ClientCnxnSocket. This increasing the likelihood that the process was sending heartbeats. In addition to this, if I haven't received a push notification from the ZK event thread implying I've lost the lock, I have higher confidence that the session hasn't been lost and that I still have the coarse lock, which satisfies my broad goal somewhat better than the current state.

Any thoughts?

Austin

________________________________

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

RE: RE: exposing lastSend

Posted by "Miller, Austin" <Au...@morganstanley.com>.

Flavio,

My desired outcome is that zookeeper exposes the last send value.

Regards,
Austin

-----Original Message-----
From: Flavio Junqueira [mailto:fpjunqueira@yahoo.com.INVALID] 
Sent: Wednesday, July 16, 2014 4:50 PM
To: user@zookeeper.apache.org
Subject: Re: <Unverified Sender>RE: exposing lastSend

Hi Austin,

I feel your pain, but what's it concretely that you'd like to happen? That we expose last send or that we also make the JNI wrapper around the C client happen or both?


-Flavio



--------------------------------------------------------------------------------

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

Re: RE: exposing lastSend

Posted by Flavio Junqueira <fp...@yahoo.com.INVALID>.

Hi Austin,

I feel your pain, but what's it concretely that you'd like to happen? That we expose last send or that we also make the JNI wrapper around the C client happen or both?


-Flavio



On Wednesday, July 16, 2014 7:36 PM, "Miller, Austin" <Au...@morganstanley.com> wrote:
 

>
>
>Hello,
>
>I'm still hoping to spark some discussion about the last send value.
>
>Given the links below...
>
>https://issues.apache.org/jira/browse/HBASE-1316
>http://www.mail-archive.com/user%40zookeeper.apache.org/msg01214.html
>https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting#Troubleshooting-GCpressure
>
>It seems the issue of thread starvation has been considered previously and is considered sufficiently important to possibly pursue a JNI solution.
>
>While a JNI wrapper around a C client might increase the likelihood of ZooKeeper client threads being scheduled in the OS, it does not completely ameliorate the issue.  Resource starvation, memory, threads, and IO, may still prevent the client from being scheduled even in a C program.
>
>It also adds new sources of failure, more complexity, and increases difficulty of deploying solutions.
>
>There still seems to be a race condition, in the JNI solution, between session loss notification and Java if a partition and pause happen simultaneously.
>
>If it is assumed that long pauses happen for some users and the last send value is exposed, then I maintain it is possible to use that value in ways that increase the chances that a transaction is not committed when a session has been lost.  Some users may actually want failover in the case of a really long pause in the JVM, as well.  From the cluster's point of view, the app rebooted.
>
>GC pauses are not the only things that cause resource starvation, either, and a cause agnostic measure, like using the last send value to gate writes, is going to provide more confidence than a JNI wrapper around C.
>
>
>Austin
>
>
>--------------------------------------------------------------------------------
>
>NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.
>
>

RE: RE: exposing lastSend

Posted by "Miller, Austin" <Au...@morganstanley.com>.

Hello,

I'm still hoping to spark some discussion about the last send value.

Given the links below...

https://issues.apache.org/jira/browse/HBASE-1316
http://www.mail-archive.com/user%40zookeeper.apache.org/msg01214.html
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting#Troubleshooting-GCpressure

It seems the issue of thread starvation has been considered previously and is considered sufficiently important to possibly pursue a JNI solution.

While a JNI wrapper around a C client might increase the likelihood of ZooKeeper client threads being scheduled in the OS, it does not completely ameliorate the issue. Resource starvation, memory, threads, and IO, may still prevent the client from being scheduled even in a C program.

It also adds new sources of failure, more complexity, and increases difficulty of deploying solutions.

There still seems to be a race condition, in the JNI solution, between session loss notification and Java if a partition and pause happen simultaneously.

If it is assumed that long pauses happen for some users and the last send value is exposed, then I maintain it is possible to use that value in ways that increase the chances that a transaction is not committed when a session has been lost. Some users may actually want failover in the case of a really long pause in the JVM, as well. From the cluster's point of view, the app rebooted.

GC pauses are not the only things that cause resource starvation, either, and a cause agnostic measure, like using the last send value to gate writes, is going to provide more confidence than a JNI wrapper around C.

Austin

--------------------------------------------------------------------------------

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

RE: exposing lastSend

Posted by "Miller, Austin" <Au...@morganstanley.com>.

> Pings, from an idle client, actually need to go out every 1/3 of
> negotiatedSessionTimeout.

If the client send thread isn't being scheduled, then this wouldn't happen.

> 1/3 of negotiatedSessionTimeout will already cause a ConnectionLoss...

> No, the ZK server *will* RST the client of if it hasn't ping in 1/3 of
negotiatedSessionTimeout.

Ok, but this still doesn't prevent a race between the transaction being committed and events firing from the client event thread after the freeze window.  In fact, you have convinced me the situation is more likely because now all threads not being scheduled for a third of negotiatedSessionTimeout (down from greater than) is sufficient to encounter the problem.

> You could just release the lock as soon as you receive ConnectionLoss
> (i.e.: without waiting for SessionExpired, which you'll only get upon
> reconnecting to a ZK server.. which could take longer, given a partition or
> loaded network). But the case you are exposing is conflated with the
> pathological scenario of a JVM instance starving it's threads... if that's
> a risk, you might as well have an external health-check process that kills
> your JVM entirely once  it's likely that the ZK thread might be starving
> (hence, losing your lock being more likely).

The lock is represented by the existence of an ephemeral node.  I can't release it, it is already released because of session death in this scenario and will have been grabbed by another JVM process somewhere else.  

I don't know what "pathological" means.   If it means nobody should care about this situation, then I must politely disagree.  I accept that it is possible that very few people should care about it, but I'm not even sure about that.

Your suggestion is to respond to an effort to increase consistency is to have yet another process that completely kills the current one instead of dealing with the  issue in a programmatic way?  What if the health check process dies?  How does it perform this health check consistently?  When performing this health check, how does it do it?  Does it keep track of the scheduling of every thread and require deep understanding of the kernel the JVM is running on?  Does it require root access as a result?  Does it create false positive situations where it can't be sure that the process is able to keep the session alive and so it aggressively kills it even though it was keeping the session alive?  Does it not increase the chances of failover from one process currently holding the lock to another process acquiring it (undesirable)?  If the rss of the JVM is increasing because of leaks by classloaders and the guard process can't allocate sufficient memory to kill the JVM for being unhealthy, then what happens? How do you deploy and test it so that it works?  What if the code is being used by a wide variety of users, how do you instruct them to manage/deploy/configure this guard process? What if the process is doing other things that don't need to be killed and I would really, really like for those to complete but not the transaction that depends on the ZK lock?  It does not seem or smell like a proper solution.

I recognize that saying it would be relatively trivial to expose the lastSend value is not a good argument because adding to a popular contract should be a deliberate action.  Even so, the code itself would be trivial to expose this value (can be done in 10 mins), so it is not an attempt at changing the way ZK works but rather I wish to make an argument that exposing the value has use to someone, even if he is pathological. :)  Possibly it would be useful to other people.  For instance, I suspect it would be useful to a library like curator.

Austin


--------------------------------------------------------------------------------

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.

Re: exposing lastSend

Posted by Raúl Gutiérrez Segalés <rg...@itevenworks.net>.

On 11 July 2014 10:31, Miller, Austin <Au...@morganstanley.com>
wrote:

> Hello,
>
> I'm looking for a way to get access to ClientCnxnSocket.lastSend without
> trying to break ZooKeeperEncapsulation.  Broadly, my goal is to use this in
> a Java process in order to increase confidence in transactions.
>
> There are resource starved situations where the ClientCnxn.SendThread may
> not be scheduled for greater than negotiatedSessionTimeout.  My
> understanding is this will lead to session loss because even though the
> connection might still be considered alive (the OS could be sending ACKs to
> packets ZK sends), the ZK server requires the client to be sending the
> packets.
>

Pings, from an idle client, actually need to go out every 1/3 of
negotiatedSessionTimeout.

>
> So, assuming
> ...a JVM process was connected to a ZK ensemble
> ...the JVM process is performing transactions
> ...ZK is being used for distributed locking with coarse granularity
> ...a reliable low-latency network connection to a healthy low-latency
> ensemble
> ...a rare event causes the machine hosting the JVM to be resource starved
> ...none of the JVM threads are scheduled for a window twice the length of
> the negotiatedSessionTimeout
>

1/3 of negotiatedSessionTimeout will already cause a ConnectionLoss...


> ...during this window, the process has lost the coarse lock on the
> ensemble (it was an ephemeral node)
>
> Then the ensemble should have agreed that the session is dead, correct?
>  Even though the connection may be considered alive at a TCP/IP transport
> level.


No, the ZK server *will* RST the client of if it hasn't ping in 1/3 of
negotiatedSessionTimeout.


>  What is more, just coming out of the state where the threads are
> scheduled, there is a race condition between the ZK threads firing session
> death event and the transaction threads committing transactions.  As I
> write this, I realize I'm not entirely sure what events ZK would send and
> in what order, it depending on what was done before the freeze and where it
> was frozen.
>
> Back to the broad goal, I want to increase confidence in this situation
> that the process still owns the ZK lock without firing off network events
> before committing every transaction.  Obviously, fine granular locks would
> solve this problem, but that comes with an unacceptable performance trade
> off.
>
> Now, let's say I could do something like "long
> org.apache.zookeeper.ZooKeeper.getLastSent()".  Well, I don't know if the
> ZK server actually received the packet, assuming it did receive the packet
> I don't know when it received the packet, and I don't know when the OS
> received the ack.  However, it does assert that the SendThread was
> scheduled and able to call System.getNanos() in ClientCnxnSocket.  This
> increasing the likelihood that the process was sending heartbeats.  In
> addition to this, if I haven't received a push notification from the ZK
> event thread implying I've lost the lock, I have higher confidence that the
> session hasn't been lost and that I still have the coarse lock, which
> satisfies my broad goal somewhat better than the current state.
>

You could just release the lock as soon as you receive ConnectionLoss
(i.e.: without waiting for SessionExpired, which you'll only get upon
reconnecting to a ZK server.. which could take longer, given a partition or
loaded network). But the case you are exposing is conflated with the
pathological scenario of a JVM instance starving it's threads... if that's
a risk, you might as well have an external health-check process that kills
your JVM entirely once  it's likely that the ZK thread might be starving
(hence, losing your lock being more likely).


-rgs