You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Patrick Hunt (JIRA)" <ji...@apache.org> on 2009/02/27 22:57:13 UTC

[jira] Created: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

helgrind thread issues identified in mt c client code
-----------------------------------------------------

                 Key: ZOOKEEPER-333
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
             Project: Zookeeper
          Issue Type: Bug
          Components: c client
            Reporter: Patrick Hunt
            Priority: Critical
             Fix For: 3.1.1, 3.2.0


helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:

valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt


==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)


==31294== Possible data race during write of size 4 at 0x42E9A58
==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)

==31294== Possible data race during write of size 4 at 0xBEFF5F30
==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)


==31294== Possible data race during write of size 4 at 0x42EB104
==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)


==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)







-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679667#action_12679667 ] 

Mahadev konar commented on ZOOKEEPER-333:
-----------------------------------------

chris,
 the run was from the trunk. I dont have a revision number but i think their hasnt been any code change to the files mentioned after I uploaded the file. So the line numbers should match.

> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-333:
------------------------------------

    Status: Patch Available  (was: Open)

> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out, helgrind_mt.out.gz, ZOOKEEPER-333.patch
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Chris Darroch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679629#action_12679629 ] 

Chris Darroch commented on ZOOKEEPER-333:
-----------------------------------------

Some of these look to be innocuous on a first glance, such as "wait called with un-held mutex" in wait_sync_completion().  I suppose the issue here might be that wait_sync_completion() doesn't check the return value from pthread_mutex_lock() before proceeding, but in practice I believe one can reasonably assume it always succeeds.

One thing that might help a bit would be to have a reference to the C code which produced this output because it refers to line numbers in the code.  Was the run against 3.1.0 or against a snapshot from trunk?  If the latter, do you know the SVN revision number?

> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar reassigned ZOOKEEPER-333:
---------------------------------------

    Assignee: Mahadev konar

> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679682#action_12679682 ] 

Mahadev konar commented on ZOOKEEPER-333:
-----------------------------------------

Helgrind: Fatal internal error -- cannot continue.
Helgrind: mk_SHVAL_ShR(tset=8192,lset=1): FAILED
Helgrind: max allowed tset=8191, lset=131071
Helgrind: program has too many thread sets or lock sets to track.


this is the error generated by helgrind when it hangs with the tests and is not able to run all of the tests.

> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out, helgrind_mt.out.gz
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-333:
-----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Looks good, +1.

Committed revision 752711.
Committed revision 752726.

> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out, helgrind_mt.out.gz, ZOOKEEPER-333.patch
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-333:
-----------------------------------

    Attachment: helgrind_mt.out.gz

This latest gzip'd helgrind output file is from svn trunk:

At revision 751008.

Note: the tests hang at the following step:

Zookeeper_operations::testOperationsAndDisconnectConcurrently1^CKilled

I had to ctrl-c the test to stop. Perhaps this means helgrind is tickling a bad problem(s)? We should re run helgrind after resolving the basic issues currently in the log, and see if this hang still occurrs. Perhaps you guys could also look at the test in question to see if anything catches your eye? The test should not be failing.


> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out, helgrind_mt.out.gz
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679667#action_12679667 ] 

mahadev edited comment on ZOOKEEPER-333 at 3/6/09 9:50 AM:
-----------------------------------------------------------------

chris,
 the run was from the trunk. I dont have a revision number but i think their hasnt been any code change to the source files after I uploaded the traces. So the line numbers should match.

      was (Author: mahadev):
    chris,
 the run was from the trunk. I dont have a revision number but i think their hasnt been any code change to the files mentioned after I uploaded the file. So the line numbers should match.
  
> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681686#action_12681686 ] 

Hudson commented on ZOOKEEPER-333:
----------------------------------

Integrated in ZooKeeper-trunk #253 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/253/])
    

> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out, helgrind_mt.out.gz, ZOOKEEPER-333.patch
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-333:
------------------------------------

    Attachment: helgrind_mt.out

i found some more while running helgrind... attacching the filtered output.

> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679677#action_12679677 ] 

Patrick Hunt commented on ZOOKEEPER-333:
----------------------------------------

ps. I'm on ubuntu intrepid ibex latest, single core cpu.


> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out, helgrind_mt.out.gz
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-333) helgrind thread issues identified in mt c client code

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-333:
------------------------------------

    Attachment: ZOOKEEPER-333.patch

most of the traces in the helgrind output seem harmless to me. This patch has some minor changes (declaring the head pointers of lists as volatile, which we keep checking in different threads without locking), which might cause some problems. the others I think is fine.

> helgrind thread issues identified in mt c client code
> -----------------------------------------------------
>
>                 Key: ZOOKEEPER-333
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-333
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: c client
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.1.1, 3.2.0
>
>         Attachments: helgrind_mt.out, helgrind_mt.out.gz, ZOOKEEPER-333.patch
>
>
> helgrind generated a number of issues, I pulled a bunch of them. Most are related to the test, some are really issues with the mt zk client code though:
> valgrind --tool=helgrind --log-file=helgrind_mt.out ./zktest-mt
> ==31294== Thread #2: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80ABCF6: notify_thread_ready (mt_adaptor.c:136)
> ==31294==    by 0x80ABE90: do_io (mt_adaptor.c:277)
> ==31294== Possible data race during write of size 4 at 0x42E9A58
> ==31294==    at 0x8050D83: terminateZookeeperThreads(_zhandle*) (ZKMocks.cc:518)
> ==31294==    by 0x805543B: DeliverWatchersWrapper::call(_zhandle*, int, int, char const*, watcher_object_list**) (ZKMocks.cc:261)
> ==31294==    by 0x80520F7: __wrap_deliverWatchers (ZKMocks.cc:220)
> ==31294==    by 0x80A287B: process_completions (zookeeper.c:1393)
> ==31294==    by 0x80ABDAA: do_completion (mt_adaptor.c:332)
> ==31294== Possible data race during write of size 4 at 0xBEFF5F30
> ==31294==    at 0x80589AF: Zookeeper_watchers::ConnectionWatcher::~ConnectionWatcher() (TestWatchers.cc:54)
> ==31294==    by 0x805D062: Zookeeper_watchers::testDefaultSessionWatcher1() (TestWatchers.cc:438)
> ==31294==    by 0x805608C: CppUnit::TestCaller<Zookeeper_watchers>::runTest() (TestCaller.h:166)
> ==31294== Possible data race during write of size 4 at 0x42EB104
> ==31294==    at 0x80A03EE: queue_completion (zookeeper.c:1776)
> ==31294==    by 0x80A3A44: zookeeper_process (zookeeper.c:1598)
> ==31294==    by 0x80AC00B: do_io (mt_adaptor.c:309)
> ==31294== Thread #29: pthread_cond_{timed}wait called with un-held mutex
> ==31294==    at 0x4027F8F: pthread_cond_wait@* (hg_intercepts.c:560)
> ==31294==    by 0x404D881: pthread_cond_wait@GLIBC_2.0 (in /lib/tls/i686/cmov/libpthread-2.8.90.so)
> ==31294==    by 0x4028037: pthread_cond_wait@* (hg_intercepts.c:574)
> ==31294==    by 0x809EBB7: pthread_cond_wait (PthreadMocks.cc:54)
> ==31294==    by 0x80AB9B3: wait_sync_completion (mt_adaptor.c:82)
> ==31294==    by 0x80A1E82: zoo_wget (zookeeper.c:2517)
> ==31294==    by 0x80A1F13: zoo_get (zookeeper.c:2497)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.