You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Kevin Jamieson (JIRA)" <ji...@apache.org> on 2013/07/06 05:35:48 UTC
[jira] [Created] (ZOOKEEPER-1720) Race in zookeeper_close() leads
to hang
Kevin Jamieson created ZOOKEEPER-1720:
-----------------------------------------
Summary: Race in zookeeper_close() leads to hang
Key: ZOOKEEPER-1720
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1720
Project: ZooKeeper
Issue Type: Bug
Components: c client
Affects Versions: 3.5.0
Environment: Ubuntu 12.04.1
Reporter: Kevin Jamieson
Using ZK 3.5.4, zookeeper_close() occasionally hangs with a backtrace of the form:
{noformat}
#0 0x00002b255fab489c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00002b255fab26b0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00002b2560568ced in unlock_completion_list (l=0x13f5430) at src/mt_adaptor.c:69
#3 0x00002b256055b9ec in free_completions (zh=0x13f5270, callCompletion=1, reason=-116) at src/zookeeper.c:1521
#4 0x00002b256055d3bc in zookeeper_close (zh=0x13f5270) at src/zookeeper.c:2954
{noformat}
At which point the zhandle_t struct appears to have already been freed, as it contains garbage:
{noformat}
(gdb) p zh->sent_requests.cond
$19 = {
__data = {
__lock = 2,
__futex = 0,
__total_seq = 18446744073709551615,
__wakeup_seq = 0,
__woken_seq = 0,
__mutex = 0x0,
__nwaiters = 0,
__broadcast_seq = 0
},
__size = "\002\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377", '\000' <repeats 31 times>,
__align = 2
}
{noformat}
There appears to be a race condition in the following code:
{noformat}
int api_epilog(zhandle_t *zh,int rc)
{
if(inc_ref_counter(zh,-1)==0 && zh->close_requested!=0)
zookeeper_close(zh);
return rc;
}
int zookeeper_close(zhandle_t *zh)
{
int rc=ZOK;
if (zh==0)
return ZBADARGUMENTS;
zh->close_requested=1;
if (inc_ref_counter(zh,1)>1) {
{noformat}
As api_epilog() may free zh in between zookeeper_close() setting zh->close_requested=1 and incrementing the reference count.
The following patch should fix the problem:
{noformat}
diff --git a/src/c/src/zookeeper.c b/src/c/src/zookeeper.c
index 6943243..61a263a 100644
--- a/src/c/src/zookeeper.c
+++ b/src/c/src/zookeeper.c
@@ -1051,6 +1051,7 @@ zhandle_t *zookeeper_init(const char *host, watcher_fn watcher,
goto abort;
}
+ api_prolog(zh);
return zh;
abort:
errnosave=errno;
@@ -2889,7 +2890,7 @@ int zookeeper_close(zhandle_t *zh)
return ZBADARGUMENTS;
zh->close_requested=1;
- if (inc_ref_counter(zh,1)>1) {
+ if (inc_ref_counter(zh,0)>1) {
/* We have incremented the ref counter to prevent the
* completions from calling zookeeper_close before we have
* completed the adaptor_finish call below. */
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira