You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Camille Fournier <ca...@apache.org> on 2016/10/14 19:44:04 UTC
data race in watch removal
All,
After looking into this bug report:
https://issues.apache.org/jira/browse/ZOOKEEPER-2615
I believe we have a system-wide race with watches on the server. AFAICT, a
request with a watch can be in flight at the same time a connection is
being closed. If the in-flight request is executed after this line of
NIOServerCnxn.close:
if (zkServer != null) {
zkServer.removeCnxn(this);
}
The watches will be added and never cleaned up.
This is particularly bad in the case of watches that are being re-created
due to a client reconnecting to a server after being disconnected, the
SetWatches command, because there can be a large number of new watches
created in this command, causing a bigger leak such as the one mentioned in
the ticket above.
Creating a test that reproduces is not something I've gotten all the way
through yet but I believe I can reproduce it with various sleep statements
locally. If you have thoughts on the right approach for a fix, LMK in the
ticket or here.
C